Average Code-Word Length

The Average Code-Word Length for a source $S$ with a set of source symbols $s_k$ could be calculated using the following formula:

$$ \bar{L} = \sum^{K-1}_{k=0} p_k l_k $$

Where:

$\bar{L}$ is the average number of bits per source symbol used in source encoding
$K$ is the size of the source alphabet $S_k$
$k$ is the Set# ${0, 1, \ldots, K - 1}$
$p_k$ is the probability of symbol $s_k$
$l_K$ is the length of binary code word assigned to symbol $s_k$

From here, we could determine the coding efficiency of the Source Encoder# as follows:

$$ \eta = \frac{L_{min}}{\bar{L}} \le 1 $$

Where:

$L_{min}$ is the minimum possible value of $\bar{L}$.

The #Source Coding could be said to be efficient when $\eta \rightarrow 1$. If the original information isn’t lost after the source coding by Source Encoder#, then the output (source code) is considered to be lossless.

If the information source is #Discrete Memoryless Source (DMS), then $\bar{L}$ is bounded by the source’s Entropy# $H(\phi)$, as defined by Shannon’s first theorem. And we can conclude that $L_{min}$ is actually $H(\phi)$.

$$ \begin{align} \bar{L} &\ge H(\phi)\\ L_{min} &= H(\phi)\\ \eta &= \frac{H(\phi)}{\bar{L}} \end{align} $$

Links to this page

TIT3131 Chapter 1: Information Sources and Sources Coding

Average Code-Word Length
Source Coding Theorem

The source code could be optimised using variable-length code, where rare symbol code is encoded in long code and frequent symbol code is encoded in short code such as in Huffman Coding#. We can calculate the Average Code-Word Length# to get the coding efficiency of the source coding scheme.
Huffman Coding

Huffman Coding is a #source coding technique utilises variable-length code which the Average Code-Word Length# approach the Entropy# of $S$, $\bar{L} \rightarrow H(S)$. It has the properties where no two messages consist of identical arrangement of bits, all code word are prefix code, higher probability symbols have shorter code words, and the two least probable code words have equal length and differ only in final digit.