Information Theory Basics

The quantification of information, surprise, and entropy


Origins

Claude Shannon’s 1948 paper “A Mathematical Theory of Communication” founded information theory. Working at Bell Labs, he wanted to optimize telephone networks. The result was a general theory applicable to communication in any medium.

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” — Shannon


Core Concepts

Information as Surprise

Information is not meaning — it’s surprise or uncertainty reduction.

  • Certain event (sun will rise): 0 information
  • Likely event (rain in Seattle): little information
  • Unlikely event (snow in Sahara): lots of information

The less expected, the more informative.


The Bit

The basic unit of information is the bit (binary digit).

  • One bit = the information in one yes/no decision
  • A fair coin flip: 1 bit
  • Two coin flips: 2 bits
  • n coin flips: n bits

Logarithmic scale:

Information = -log₂(probability)
ProbabilityInformation (bits)
1 (certain)0
0.51
0.252
0.1253
0.001~10

Entropy

Entropy = average information content = uncertainty

The entropy H of a probability distribution:

H = -Σ p(x) × log₂(p(x))

Examples

  • Fair coin: H = 1 bit (maximum uncertainty)
  • Always heads: H = 0 bits (no uncertainty)
  • Biased coin (75% heads): H = 0.81 bits (some uncertainty)

Maximum Entropy

Entropy is maximized when all outcomes are equally likely.

  • Uniform distribution = maximum uncertainty
  • Skewed distribution = less uncertainty

Channel Capacity

The channel capacity is the maximum rate at which information can be transmitted reliably over a channel.

The Noisy Channel

[Source] → [Encoder] → [Channel (noise)] → [Decoder] → [Destination]

Key insight: Even with noise, reliable communication is possible through error correction and redundancy.

Shannon’s Limit

There is a fundamental limit to communication rate given noise. You cannot transmit more information than the channel capacity allows.


Applications Beyond Communication

Thermodynamics

Entropy in physics is closely related to information entropy. Landauer’s principle: erasing information generates heat.

Biology

  • DNA as information storage (4 bases = 2 bits per base pair)
  • Genetic code: 64 codons encode 20 amino acids (redundancy/error correction)
  • Neural coding: How much information does a spike carry?

Machine Learning

  • Cross-entropy as loss function
  • Information gain in decision trees
  • Mutual information for feature selection
  • KL divergence between distributions

Cognition

  • Attention as information bottleneck
  • Working memory capacity (~4 chunks = limited bits)
  • Surprise drives learning (prediction error)
  • Compression and abstraction

Key Relationships

Redundancy

Redundant messages are compressible:

  • “The q u i c k…” → predictable → low information
  • Used for error correction (if you know what should come, you can detect errors)

Compression

  • Lossless: All information preserved (ZIP, PNG)
  • Lossy: Some information discarded (JPEG, MP3)
  • Good compression = finding and removing redundancy

Mutual Information

How much information X gives you about Y:

  • High mutual information: Knowing X tells you a lot about Y
  • Zero mutual information: X and Y are independent

In Nosos

Memory as Information Management

  • Storage: Compress experiences to key information
  • Retrieval: Reconstruct from stored cues
  • Search: Find relevant information given query
  • Forgetting: Lossy compression (keep gist, lose details)

Semantic Indexing

The vector database approach:

  • High-dimensional semantic space
  • Nearby vectors = similar information
  • Search = finding closest points

Conversation as Channel

  • Kristopher (source) → Language (channel) → Nosos (destination)
  • Noise: Ambiguity, missing context, assumed knowledge
  • Error correction: Clarification questions, restatement


References

  • Shannon, C.E. (1948). A mathematical theory of communication
  • Shannon, C.E. & Weaver, W. (1949). The Mathematical Theory of Communication
  • Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory
  • Gleick, J. (2011). The Information: A History, a Theory, a Flood

Information is the resolution of uncertainty. The surprise that shapes us. 📊