Lempel-Ziv Coding

Lempel-Ziv Coding is a fix-length #source coding technique that is adaptive to and able to represent the high order relationships of the symbols using codebook or dictionary. It will produce a fix-length code set to represent the source symbols, which is great for synchronous transmission. A typical codebook used will consists of 4096 entries or 12 bits in size.

The codebook will be constructed based on the source data stream by cutting it into short segments or subsequences, where any new segment or subsequence will be added to the codebook as a new code. Then, the code recorded will be divided into front portion (excluding very last digit) and last digit, and be translated to a new numerical value based on the number of dictionary position according to the code value. For example, if dictionary position 5 has a code of 110, and dictionary position 3 has a code of 11, then the new value constructed will be 31 (as the code for dictionary position 1 is 0). Lastly, code set will be produce by converting all translated numerical values into binary representation by translating the front portion into its binary form and then combine it with the last bit obtained before the numerical translation. For instance, the value 31 in dictionary position 5 will be transformed into 0110 as the binary representation for 3 is 011 and the last bit for dictionary position 5 is 0 before the numerical translation.

Note: The codebook is constructed with initial 0 and 1 assigned to dictionary position or numerical position 1 and 2 respectively. Therefore, 0 and 1 are not new to the codebook any more.

Note: If the codebook is full, that is there is no space (dictionary position) to obtain new code, then reading on the data stream should stop.

Decoding the Lempel-Ziv code set could be as easy as just reading the front portion of the received code. The front portion identifies which dictionary position to look up for the value, then be combined with the last bit. As an example, the front portion of 0110 is 011 which means 3 in decimal. Look up to 3rd dictionary position in the codebook we found it stores the value 11. Thus, we know the true value for the binary 0110 is 110, which is stored in the 5th dictionary position in the codebook.

The compaction ratio with the English text will be 55% using Lempel-Ziv Coding.

#cryptography