The Day KL Divergence Was Small

"I'm getting along with Riku really well today."

Yuki said with a puzzled look.

"Really? Am I?" Riku tilted his head.

Aoi laughed. "Your KL divergence must be small."

"KL... what?"

"Kullback-Leibler divergence. A measure of distance between two probability distributions."

Yuki opened her notebook. "Distance between probability distributions?"

"Human thought can be viewed as a kind of probability distribution. What you prioritize, what you tend to think about."

Riku showed interest. "So you're saying my thought distribution is close to Yuki's?"

"Exactly. D_KL(P||Q) = Σ P(x) log(P(x)/Q(x)). It measures the difference of distribution Q from distribution P's perspective."

Yuki pondered. "Why not just use regular distance? Why KL divergence?"

"Good question. KL divergence is asymmetric. D_KL(P||Q) ≠ D_KL(Q||P)."

"Asymmetric?"

"Viewing Q from P's perspective is different from viewing P from Q's perspective."

Aoi gave an example. "You understanding someone versus them understanding you isn't symmetric."

Riku slapped his knee. "True! It seems easier for Aoi-senpai to understand me than for me to understand her."

"That's the asymmetry of KL divergence."

Yuki wrote in her notes. "What happens when KL divergence is zero?"

"The two distributions match perfectly. Meaning you're thinking exactly the same."

"But that's not realistic, right?"

"Exactly. So being small is good enough. It's proof of mutual understanding."

Riku suddenly got serious. "So when we fight, KL divergence is large?"

"Probably. You're not correctly estimating the other's thought distribution."

Aoi continued. "In information theory, KL divergence can also be interpreted as the difference in surprise."

"Difference in surprise?"

"When someone with distribution P encounters an event from distribution Q, it's the degree to which they feel it's unexpected."

Yuki understood. "When values differ, the same event is received differently."

"Exactly. That's why communication becomes difficult when KL divergence is large."

Riku asked. "How do you make KL divergence smaller?"

"Through dialogue. Learning the other's probability distribution."

"Learning?"

"What they value, how they think. Collecting data and updating the model inside yourself."

Yuki nodded. "That's why when you spend time together, you understand each other better."

"It's statistical learning. The accuracy of estimating the other's distribution improves."

Aoi supplemented. "It's the same in machine learning. Estimating the true distribution from training data. The error is KL divergence."

"It's all connected," Riku was impressed.

Yuki suddenly asked. "What about KL divergence between yourself and your past self?"

"Interesting perspective. People grow. Values change. Meaning the probability distribution changes."

"Not understanding your old self anymore is also because KL divergence increased?"

"Exactly. Time changes distributions."

Riku laughed. "But today, Yuki and I have small KL divergence."

"Yeah. Today's easy to talk."

Aoi smiled. "That's because you both met each other halfway."

"Met halfway?"

"Respecting the other's distribution while adjusting yourself. It's an optimization problem."

Yuki summarized. "KL divergence is a ruler measuring the distance of understanding."

"And the effort to make it smaller is the essence of communication."

The three nodded quietly.

In the after-school classroom, three probability distributions gently overlapped.