What is Shannon Information?
Definition (in bits, log base 2):
Average behavior (i.i.d. case):
Consequence: number of typical samples and their probability:
Short history
Shannon (1948) identified the total codelength with IS: under a given model P, the best lossless code for an observed sample of length n needs about IS(x^n) bits.
Physical meaning
Statistical physics analogy: under a uniform distribution, IS equals the logarithm of the number of microstates, a measure of rarity.
Landauer’s principle: erasing one bit of information has a minimum heat cost:
Mutual information as extractable work: correlations quantified via IS upper-bound the work extractable from correlations.
Mathematical properties (briefly)
Additivity under independence:
Lower–upper bounds (deterministic and uniform cases):
Information rate (stationary–ergodic processes):
Coding meaning: there exists a prefix code achieving the following bound:
Relation to Kolmogorov complexity K
Upper approximation (whenever the model P is computable):
Proof sketch: arithmetic coding provides a prefix code of length at most IS(x^n; P) plus a constant; a fixed universal Turing machine can decode it given a description of P.
Typical accuracy (stationary–ergodic sources, e.g., ergodic Markov chains):
Per-symbol limits equal the same rate r:
Limitations and remarks
Model dependence: IS depends on the chosen model; a poor model overestimates.
Unknown length: the sample length must also be encoded; approximate overhead:
Per-symbol overhead (negligible for large n):
Continuous variables: density-based expressions are unit-dependent; see the canonical form below.
Semantics-blind: IS measures statistical structure, not meaning.
Ergodic Markov chains: due to the typical equality above, IS is especially reliable for approximating K.
Mismatched model and excess codelength
If the true source is P but encoding uses Q (i.i.d.; for ergodic processes the rate is constant):
Quick takeaways
IS measures rarity and the minimum achievable codelength relative to a model:
Single mention of H (average identity):
Computable sources (upper bound) and ergodic sources (typical equality):