Short Story ⟡ Informatics

A Small Battle Over Code Length

Aoi explains how human communication relates to optimal coding theory and the balance between efficiency and understanding.

  • #code length
  • #variable-length coding
  • #Huffman coding
  • #optimal codes

"Shorter wins!"

Riku slapped the paper. They were working on a coding assignment in the club room.

"But it has to be uniquely decodable," Aoi pointed out calmly.

Yuki sat between them. "What are you competing about?"

"Code length optimization," Aoi explained. "Assign codes to four symbols A, B, C, D. Aim for the shortest average code length."

Riku showed his proposal.

"A: 0 B: 1 C: 00 D: 01"

"Hmm," Aoi shook his head. "If A is '0' and C is '00', when you receive '00', you can't tell if it's A or C."

"Oh, right!"

"This is called unique decodability. Any bit string must be interpretable in exactly one way."

Yuki wrote in the notebook. "So what should we do?"

Aoi wrote a new code.

"A: 0 B: 10 C: 110 D: 111"

"This is uniquely decodable. Reading from the beginning, it always determines exactly one interpretation."

"But isn't that long?" Riku looked dissatisfied.

"That depends on the occurrence probability of symbols," Aoi drew a probability table.

"A: 50% B: 25% C: 15% D: 10%"

"Average code length is 0.5×1 + 0.25×2 + 0.15×3 + 0.10×3 = 1.75 bits."

Yuki verified the calculation. "Assign shorter codes to frequent symbols..."

"Yes. That's the principle of Huffman coding."

Riku pondered. "What if we make D '0' instead?"

Aoi answered immediately. "Average code length increases. 3.15 bits. Inefficient."

"That seems unfair," Riku said.

"It's not unfair. It's information-theoretically fair," Aoi explained. "Events that occur frequently have low information content. So short codes are fine. Rare events have high information content, so using long codes causes little loss."

Yuki's eyes sparkled. "The relationship between entropy and code length!"

"Exactly. Entropy H(X) gives the theoretical lower bound for average code length. Here, entropy is about 1.75 bits. Huffman codes nearly reach this lower bound."

Riku drew another code on the whiteboard.

"What if we make everything 2-bit fixed? A: 00 B: 01 C: 10 D: 11"

"Average code length is 2 bits. Uniquely decodable but inefficient," Aoi said.

"But implementation is simple, right?" Riku countered.

"True. That's why ASCII and Unicode are fixed-length. But for compression, variable-length is superior."

Yuki suddenly thought of something. "The optimal code changes by language, like Japanese or English, right?"

Aoi was impressed. "Sharp. Each language has different character frequencies, so optimal codes differ too. That's why compression is context-dependent."

"Context..."

"Yes. General-purpose compression is difficult. That's why ZIP and gzip analyze data before encoding."

Riku laughed. "The battle over code length runs deep."

"Far from a small battle, it's the core of computational theory," Aoi said quietly.

Yuki organized in the notebook. Short codes, long codes. Each has its role. Not equal, but fair. The aesthetics of information theory lay here.

"Next, let's actually build a Huffman tree," Aoi suggested.

"A tree?" Riku was surprised.

"Another new concept," Yuki smiled.

The battle over code length had only just begun.