Short Story ⟡ Informatics

Talking About Compression in the Club Room Corner

Exploring data compression and how to efficiently represent information without losing meaning.

  • #lossless compression
  • #lossy compression
  • #huffman coding
  • #redundancy removal

"Isn't this file too large?"

Yuki was looking at the laptop screen.

"Just compress it," Riku said casually.

"Compress? How?"

Aoi approached. "Let's talk about data compression. A practical application of information theory."

"Compression means making data smaller, right?"

"Yes. But without losing information. That's lossless compression."

Riku had a question. "Can you make it smaller without losing information?"

"You can. By removing redundancy," Aoi wrote on the whiteboard.

"For example, the string 'AAABBBCCC'. If we write it as '3A3B3C', it becomes shorter."

"That's run-length encoding."

Yuki understood. "Represent repetitions with numbers!"

"Correct. That's the basic principle of compression. Find patterns and express them efficiently."

Aoi gave another example.

"'AAAA' versus 'ABCD'. Which is easier to compress?"

"'AAAA,'" Yuki answered immediately.

"Why?"

"It has a pattern. 'ABCD' is random with no pattern."

"Precise. Low-entropy data compresses easily. High-entropy data compresses poorly."

Riku thought. "What about already compressed data?"

"Cannot compress further. It has reached entropy's limit."

Aoi drew a diagram in the notebook.

"Huffman coding. Assign short codes to frequent characters, long codes to rare characters."

"For example, English text. 'e' appears frequently, so use a short code."

Yuki got excited. "Is this optimal?"

"Nearly optimal. Can achieve average code length close to entropy."

Riku asked another question. "How do you compress photos or music?"

"That's lossy compression," Aoi explained. "Discard information humans don't notice."

"Discard?"

"For example, JPEG removes high-frequency components. The human eye barely notices."

"MP3 removes sounds that are hard to hear."

Yuki asked worriedly. "Can't you restore it?"

"Cannot restore. That's why it's lossy. But compression ratio is high."

Aoi showed a comparison.

"Lossless compression: Complete data preservation, low compression ratio Lossy compression: Partial data loss, high compression ratio"

Riku understood. "That's why photos degrade when saved as JPEG multiple times."

"Correct. Information gradually gets lost."

Yuki wrote in the notebook. "Compression is information theory in practice."

"Exactly. Shannon's entropy tells us compression limits."

Aoi wrote an equation.

"H(X) ≤ average code length < H(X) + 1"

"Cannot compress below entropy. This is Shannon's source coding theorem."

Riku was impressed. "Theory supports actual technology."

"Yes. ZIP files, PNG images, video streaming. All based on information theory."

Yuki suddenly thought of something. "Can conversation be compressed?"

Aoi smiled. "It can. That's summarization. Keep only important information."

"But nuance gets lost," Riku pointed out.

"Yes. Lossy compression. There's always a tradeoff."

Yuki looked at the laptop. "Then I'll try compressing this file."

Seconds later, the file size halved.

"Amazing!"

"There was redundancy," Aoi explained. "The compression algorithm found patterns hidden in the data."

Riku wrote in his notebook. "Compression is the technology of finding waste."

"Good summary," Aoi acknowledged.

The three laughed. In the club room corner, they learned another lesson.