Short Story ⟡ Informatics

Encoding Girl and Random Boy

Aoi explains how human communication relates to optimal coding theory and the balance between efficiency and understanding.

  • #source coding
  • #data compression
  • #huffman coding
  • #optimal encoding

"Aoi-senpai, look at this."

Yuki opened their notebook. The same sentence was written in many different ways.

"They're all the same content, but I tried different writing styles. Which can be shortest?"

Aoi looked with interest. "This is about encoding. There are many ways to represent information, but information theory's job is to find efficient encoding."

"Efficient?"

"Conveying the same information with fewer bits."

Riku chimed in from the side. "I'm good at shortening messages. I send 'k' for 'okay'."

"That's too abbreviated and might not get through," Aoi smiled wryly. "But the thinking is correct. If you encode frequently used words shorter, you can reduce the total length."

"Frequently used...?" Yuki pondered.

Aoi wrote a string on the whiteboard.

"For example, the string 'AAABBC'. A appears 3 times, B twice, C once. With normal encoding, A=00, B=01, C=10, you need 12 bits total."

"But it can be shorter?"

"Yes. There's a method called Huffman coding. Assign short codes to high-frequency characters."

Aoi began drawing a tree structure.

"What if A=0, B=10, C=11?"

Yuki calculated. "AAA=000, BB=1010, C=11... total of 9 bits!"

"Correct. This is encoding optimization."

Riku asked curiously. "But if A is just '0', how do you distinguish it from the next character?"

"Sharp question. This is the condition for instantaneous decodability. Design codes so no code is the prefix of another."

Aoi pointed to the diagram. "Using this tree structure, no code is a prefix of any other. So it can be uniquely decoded."

"Tree structure?" Yuki tilted their head.

"Called a Huffman tree. First, combine two low-frequency characters to make a new node. Repeat this to build the tree."

Riku tried moving his hands. "Combine C (frequency 1) and B (frequency 2)... then the ABC chunk (frequency 3) and A (frequency 3)..."

"Yes. Ultimately, you get optimal codes according to frequency."

Yuki thought. "What if it's a completely random string?"

"Good question. If all characters appear with equal probability, encoding won't make it shorter."

"Because entropy is high," Riku said.

Aoi showed a surprised expression. "You've grown, Riku. Yes, entropy determines the compression limit. That's Shannon's source coding theorem."

"So more patterns mean more compression?" Yuki summarized.

"Exactly. Higher redundancy enables more efficient encoding."

Riku took out his smartphone. "Do zip files use this?"

"The basic principle is the same. In practice, they use more complex algorithms. Like LZ77 or arithmetic coding."

"Sounds difficult," Yuki said.

"But the underlying idea is the same. Represent frequently appearing patterns shorter."

Riku suddenly thought of something. "So my 'k' strategy is valid if the context is solid?"

"Theoretically. But it assumes the receiver shares the context."

Yuki laughed. "Information theory is practical."

"It's used everywhere in computers. Image, video, audio compression. All applications of coding theory."

Aoi said quietly. "Information becomes lighter through representation. But the content doesn't change. That's the beauty of encoding."

The three pondered for a while about the invisible weight and lightness of information.