Monday, November 25, 2013

Understanding the Various Compression, Encryption and Archive Formats

In computer term, archive is a single file that stores within itself different files and folders. There are several archive formats available and each comes with its own pros and cons. Some archive formats come with compression support (which makes your file size smaller) while others support encryption. Yes, and you guessed it, some archive formats do support both compression and encryption. Let’s find out more about the compression and encryption algorithms used and the various archive formats.
Compression algorithm is the method used by the archive to compress the files and make the overall file size smaller.
compression format and algorithm

1. LZMA/LZMA2

Lempel–Ziv–Markov (LZMA) chain algorithm is a lossless data compression algorithm. LZMA uses a dictionary compression algorithm which makes use of complex data structures to encode one bit at a time.
LZMA2 is a container which contains both the uncompressed and LZMA-compressed data. It supports multi-threaded compression and decompression of data. It can also compress data that is not compressible with other compression algorithms.

2. Burrows-Wheeler Transform Algorithm (BWT)

BWT works by permuting a string of text in order and then compress them by replacing the repeating characters into symbols.

3. PPM

Prediction by partial matching (PPM) is a statistical data compression method which works by using set of previous symbols in the uncompressed symbol stream to predict the next symbol in the stream.

4. Deflate

Deflate is a popular data compression algorithm which uses a combination of LZ77 and Huffman coding algorithms to compress data (combining LZMA and PPM algorithms to produce more compression). Since Deflate does not contain implementations restricted by patents, it has become very popular and is widely used, especially in Linux.
Now let’s go through some of the popular encryption methods:
encryption-format-algorithm

1. DES

Data Encryption Standard uses private secret keys to encrypt and decrypt data. The secret key is selected randomly from a 56 to 64-bit address space.

2. AES

Advanced Encryption Standard is an encryption algorithm used by the US agencies to secure sensitive data. You can encrypt data using 128, 192 and 256 bits of encryption. AES uses a symmetric key algorithm which means that a common key is used for encrypting and then decrypting the data.

3. Blowfish

Blowfish encryption algorithm encrypts the archives with a 64-bit block size and a variable key length of 32 to 448-bits.
Note: There are several other encryption algorithms but the above-mentioned three are the most used ones.
There are various archive formats available. Below, we will evaluate each archive format using three parameters – whether it supports compression and encryption, which Operating System and software is available for its usage.

1. Tar

Tape Archive (Tar) is one of the oldest archive formats. Initially, it was used to combine and write data to sequential tape drives but was later standardized as a compression format. Tar is mostly used in Linux and it doesn’t support compression or encryption. You can also use it on Windows with installation of additional software. Most of the modern archiving utilities support this format. The exceptions include Disk Archiver and KGB Archiver.

2. GZ

GZ or GZip is one of the most popular compression formats used in both Windows and Linux. GZip used the Deflate compression algorithm to compress the archived files. GZip also supports multi-part file transfers meaning that you can create smaller parts of a large GZip file for easy sharing and transfer. Since GZip is quite popular, most of the modern archiving utilities have support for compressing and decompressing files using the GZip format including 7-Zip, BetterZip, PKZip, WinZip and WinRAR.

3. BZ/BZ2

BZ is very similar to GZ but uses Burrows-Wheelers Transform Algorithm, which results in a little more compression and smaller file size. Although the compression is slow, decompression is quite fast. Most of the software which support GZ also support BZ.

4. Zip

compression_winzip
Zip is probably the most well-known and used archiving format. Zip uses the Deflate algorithm and supports lossless compression. It also supports AES and DES encryption. Most modern Operating Systems come with built-in support for Zip format, so you don’t need a separate software for archiving and un-archiving Zip files.

5. 7Z

compression_7zip
7Z archiving format was introduced with a free and open source utility called 7-Zip. It is the most advanced general compression and archiving format which supports most of the data compression and encryption algorithms, including the ones we have discussed above. 7Z format compresses the files more than any other format but is relatively slower in processing. Another limitation is that the 7-Zip software is only available for Windows. There is no visual support for Mac or Linux. 7Z also supports multi-part archiving.

6. RAR

compression_winrar
RAR is a proprietary archiving format. While it can be read and extracted by other utilities like 7-Zip and WinZip, it can only be created using WinRAR utility. RAR was the most popular format for multi-part archiving before 7Z was released. Now 7Z can do the same task for free which RAR does by making its users pay for the WinRAR software. RAR supports AES encryption.
Here are some of the relatively lesser known formats:
XZ is a lossless data compression format which uses LZMA2 compression algorithm. It can be thought as a stripped down version of 7Z.
LHA, previously known as LHarc, is primarily used for compressing installation files and games (mostly used in Japan). Interestingly, the Japanese version of Windows 7 comes with the built-in support for LHA archives.
ACE is a proprietary data compression archive file format which was a competitor to RAR format in the early days of 2000.
StuffIt was primarily released for Mac but versions for Windows, Linux and Solaris were released afterwards. This is a proprietary compression format used by StuffIt utilities.
In Linux, the most commonly used format is gz (or tar.gz), followed by bz, whereas in Windows or Mac, the most commonly used format is Zip. For cross-platform compatibility, Zip format is the one to go for. If you want features like security, high compression and multi-part archiving, go for 7Z format. RAR is similar to 7Z except that it comes with a price tag. Avoid it as much as possible.


Which file format and utility do you use for compression?

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.