Huffman Coding

Huffman Coding is a #source coding technique utilises variable-length code which the Average Code-Word Length# approach the Entropy# of \(S\), \(\bar{L} \rightarrow H(S)\). It has the properties where no two messages consist of identical arrangement of bits, all code word are prefix code, higher probability symbols have shorter code words, and the two least probable code words have equal length and differ only in final digit.

To do Huffman Coding is to construct a Huffman tree following the below steps with the help of table:

List the source symbols in descending order based on probability.
Combine the probabilities of the two symbols that have the lowest probabilities (the lowest and the second lowest) and reorder the resultant probability as high as possible.
Repeat Step 1 until there is only two probabilities left in the table.
Encode 1’s to the lowest probability for each column, and 0’s to the second-lowest probability.
Keep on regressing (that is to follow the path or arrows) and obtain the initial read code word.
Reverse all code word to acquire optimal code.

Note: When there is a reordering, draw the arrow to denote that. If the probabilities are combine, merge their arrows.

If the combined symbol (that is those symbols that have the lowest probabilities) is moved higher, smaller Variance of Code-Word Length# will be achieved, which is preferable in Huffman Coding.

Although Huffman Coding is rather efficient, the requirement of knowledge of statistics of the source cannot be always known a priori (in advance). Moreover, it fails to fully represent the high-order relationship between words and phrases, which will result in great memory usage if it does so.

Links to this page

TIT3131 Chapter 1: Information Sources and Sources Coding

Huffman Coding
Source Coding Theorem

Huffman Coding#

The source code could be optimised using variable-length code, where rare symbol code is encoded in long code and frequent symbol code is encoded in short code such as in Huffman Coding#. We can calculate the Average Code-Word Length# to get the coding efficiency of the source coding scheme.