Entropy

Entropy is a measure of uncertainty, to know the average information content per source symbol before observing their output. It is the basic measurement of the #information, which comes in the unit of bits per symbol or message, denoted by $H(\phi)$. The formal mathematical definition is shown as below:

$$ H(\phi) = E[I(s_k)] = \sum^{K-1}_{K=0} p_k \log (\frac{1}{p_k}) $$

Where:

$E$ represents the entropy
$I$ represents the Information# of the event
$s$ represents the occurrence of the event
$p$ represents the probability of the event, and
$\log$ represents $\log_2$

For a Discrete Memoryless Source#, Entropy is bounded as:

$$ 0 \le H(\phi) \le \log K $$

Where:

$K$ is the radix (number of symbols) of the alphabet $\phi$

$H(\phi)$ is 0 when the probability $p_k$ is 1 for some $k$. This is corresponds to no uncertainty which is the lower bound of the Entropy. $H(\phi)$ is $log(K)$ when the probability $p_k$ is $\frac{1}{K}$ for all $k$. This is corresponds to maximum uncertainty which is the upper bound of the Entropy.

Additionally, we could say that $H(\phi)$ gives a #fundamental limit on the average number of bits per source symbol to fully represent a DMS.

Links to this page

TIT3131 Chapter 1: Information Sources and Sources Coding

Entropy
Mutual Information

Since #Entropy represents the uncertainty about the channel input before observing the channel output, and #Conditional Entropy represents the uncertainty about the channel input after observing the channel output, we could remove the noise uncertainty by:
Join Entropy

Joint Entropy is defined as below, which represent the join Entropy of $X$ and $Y$:
Information Theory

Information Theory is a quantitative measure of the information# contained in message signals and allows us to determine the capacity of the communication system to transmit information# from source to destination. It deals with the mathematical modelling and analysis of a communication system#. The laws of probability are applied to Information Theory. It covers Source Coding Theorem# and Channel Coding Theorem#.
Information Source

When said information source ($S$) is a $n$-oder extension of itself, this means that it is $S^n$, and it will produce Block Code# of size $n$. The total number of block codes produced will be $K^n$ where $K$ is the number of individual symbols. Getting the probabilities for each block is as easy as multiplies each individual symbol’s probability. See details on Entropy# on how to calculate its uncertainty.
Information Rate

$H$ is the Entropy#
Information

Before that, we need to group all of them together if they have the same value pattern ~~(having the same symbols produced albeit in different arrangement)~~. Then, we calculate the corresponding probability for the group. After that, we could get its Entropy# $H(S^n)$ by getting the summation of the product of probability for the block group and its occurrence.
Huffman Coding

Huffman Coding is a #source coding technique utilises variable-length code which the Average Code-Word Length# approach the Entropy# of $S$, $\bar{L} \rightarrow H(S)$. It has the properties where no two messages consist of identical arrangement of bits, all code word are prefix code, higher probability symbols have shorter code words, and the two least probable code words have equal length and differ only in final digit.
Discrete Memoryless Source (DMS)

Due to the nature of relatively independence of the symbol produced by DMS, if the symbols are transmitted in blocks, then we could expect the #Entropy of $S^n$ is equal to $n$ times of entropy of $S$.
Conditional Entropy

Conditional Entropy represents the amount of uncertainty remaining about the #channel input after the channel output has been observed (which is different from Entropy which measure the information in the source). It is defined mathematically by:
Channel Coding Theorem

$H(S)$ is the Entropy# from a DMS with an alphabet $S$
Average Code-Word Length

If the information source is #Discrete Memoryless Source (DMS), then $\bar{L}$ is bounded by the source’s Entropy# $H(\phi)$, as defined by Shannon’s first theorem. And we can conclude that $L_{min}$ is actually $H(\phi)$.

#math