Computer Representation

Huffman Coding

A greedy compression algorithm that builds a prefix-free code by repeatedly merging the two smallest frequencies.

Huffman coding can feel abstract when learners only see the final tree or the finished bit strings. This lesson slows the process down so students can see how local greedy choices gradually build a global coding structure.

The key teaching flow is: start with frequencies → merge the two smallest → rebuild the tree step by step → extract codes from root to leaf → use those codes to encode and decode.

Category
Computer Representation
Algorithm Type
Greedy
Core Output
Prefix-Free Codes
Main Goal
Compression

Quick Facts

Input

Huffman coding begins with symbols and their frequencies.

Greedy Rule

At each step, merge the two nodes with the smallest frequencies.

Tree Meaning

The finished binary tree determines the binary code for each symbol.

Why It Works

More frequent symbols tend to end up closer to the root and therefore receive shorter codes.

Key Idea

Huffman coding is not just a tree-building activity. It is a way to assign shorter binary strings to more frequent symbols and longer strings to less frequent ones.

The most important mental model is: small local merges build a globally efficient prefix-free code.

Visualization

This visualization highlights the two minimum-frequency nodes at each step, merges them into a parent node, reinserts that new node into the ordered set, and repeats until the final tree is complete.

Watch actively: pause before each merge, predict which two nodes combine next, and decide what new parent frequency will appear.

How It Works

1
Create Leaf Nodes

Begin with one leaf node for each symbol and its frequency.

2
Sort by Frequency

Arrange the nodes in ascending order so the smallest two are easy to identify.

3
Merge the Two Smallest

Create a new parent node whose value is the sum of those two frequencies.

4
Repeat Until One Root Remains

Reinsert the new parent into the ordered set and continue until the full tree is built.

Why the Greedy Choice Matters

Always the Two Smallest

The algorithm always chooses the two currently smallest frequencies, not just any convenient pair.

Lower Frequencies Go Deeper

Symbols with smaller frequencies are more likely to move deeper into the final tree.

Higher Frequencies Stay Closer

More frequent symbols usually end up nearer the root, which gives them shorter codes.

Compression Comes from Structure

The final savings are not magic — they come directly from how the tree places symbols at different depths.

From Tree Structure to Binary Codes

After the tree is complete, each code is read from the path from the root to a leaf. A common convention is:

Left edge  = 0
Right edge = 1

Following the root-to-leaf path gives the code for that symbol. Only leaf nodes receive final codes.

What Prefix-Free Means

No Code Starts Another Code

A prefix-free code means no valid symbol code is the beginning of another symbol code.

Why That Helps

This property allows a bit stream to be decoded unambiguously from left to right.

Leaves Only

Because only leaves represent symbols, no completed code path can continue into another symbol code.

Tree Structure Explains the Rule

Prefix-free coding is not an extra rule added afterward. It comes from the shape of the finished tree.

Encoding and Decoding

Once the final code table is known, encoding means replacing each symbol with its binary code. Decoding means reading the bit stream by walking the tree from the root to a leaf, then restarting at the root.

Encoding:
symbol → code → concatenate

Decoding:
read bits from root
stop at a leaf
output symbol
return to root

In your lesson, a sample word such as CAFE helps connect the abstract tree to practical use. :contentReference[oaicite:1]{index=1}

Why It Matters

Greedy Algorithms

Huffman coding is one of the clearest examples of a greedy algorithm that builds an optimal structure step by step.

Compression Reasoning

It shows how data representation can change storage cost without changing meaning.

Tree Interpretation

Learners practice translating between frequencies, tree depth, and binary strings.

Transferable Thinking

The lesson strengthens greedy reasoning, ordered merging, and structural interpretation across multiple forms.

Pseudocode

A common high-level version of Huffman coding looks like this:

HUFFMAN(symbols with frequencies):
    create a leaf node for each symbol
    place all nodes into a min-priority queue

    while more than one node remains:
        x = extract minimum
        y = extract minimum

        z = new internal node
        z.left = x
        z.right = y
        z.freq = x.freq + y.freq

        insert z back into the queue

    return the remaining node as the root


EXTRACT-CODES(node, path):
    if node is a leaf:
        output (node.symbol, path)
    else:
        EXTRACT-CODES(node.left,  path + "0")
        EXTRACT-CODES(node.right, path + "1")

The important pattern is: repeated minimum merge first, then root-to-leaf path reading second.

Common Confusions

“Any two nodes can be merged”

No. The greedy rule requires merging the two smallest available frequencies at each step.

“Codes are assigned before the tree is complete”

Not usually. Final codes are read after the full tree structure has been built.

“Internal nodes are symbols too”

No. Only leaves represent original symbols. Internal nodes are structural frequency sums.

“Prefix-free is just a memorized definition”

It becomes much easier to understand when you see that only leaves are valid outputs in the tree.

Outcome

After this lesson, learners should be able to explain the greedy merge rule, build a Huffman tree from frequency data, derive prefix-free codes from root-to-leaf paths, and apply those codes to basic encoding and decoding tasks.