Cross Entropy and the Nature of Failure

Kusuri

Scenario Architect

I'm a university student, social science, also drug science is one my hobbies, as well as writing, reading a novel is also one of them.

💬 Talk, if you would like to →

"The test was terrible..."

Yuki slumped over the desk. The math test had just been returned.

"What's your score?" Riku peered over.

"I don't want to say."

Aoi quietly approached. "There are many ways to measure failure. Information theory has such metrics too."

"Measure failure?" Yuki looked up.

"Cross entropy. It quantifies the gap between prediction and reality."

Mira opened her notebook and drew a diagram. Two probability distributions side by side.

Aoi continued explaining. "For example, predicting tomorrow's weather. The truth is 80 percent sunny, 20 percent rain. But you predict 50 percent sunny, 50 percent rain?"

"That's off," Yuki answered.

"That gap is cross entropy. H(P,Q) = -Σ P(x) log Q(x). P is the true distribution, Q is the predicted distribution."

Riku tilted his head. "Why 'cross'?"

"Because we evaluate the predicted distribution Q using the true distribution P. We're crossing two distributions."

Mira wrote a supplement. "Minimum when P = Q"

"Right. When prediction is perfect, cross entropy is minimized. It equals the entropy H(P)."

Yuki thought. "So the larger the cross entropy, the more wrong the prediction?"

"Exactly. In machine learning, we use cross entropy as a loss function. We train models to reduce cross entropy."

Riku latched onto this. "Loss function means degree of failure?"

"Precisely. Neural networks adjust their weights to minimize cross entropy."

Mira drew a diagram on a new page. An example of a classification problem. A model judging cat or dog.

Aoi continued. "For instance, we show a cat image. The correct answer is 'cat=1.0, dog=0.0'. But the model outputs 'cat=0.6, dog=0.4', then cross entropy is about 0.51."

"The prediction is ambiguous, so the penalty is large," Yuki understood.

"Right. If it's 'cat=0.9, dog=0.1', cross entropy is about 0.11. A confident prediction close to correct has small loss."

Riku pondered. "But what if it's completely wrong like 'cat=0.1, dog=0.9'?"

"Cross entropy is about 2.3. A large penalty."

"Being able to quantify failure is convenient."

Aoi nodded. "Furthermore, cross entropy and KL divergence are related. H(P,Q) = H(P) + D_KL(P||Q)"

Mira pointed at the formula. "Cross entropy = Entropy + KL divergence"

"That is, cross entropy consists of two parts. One is the uncertainty of the true distribution H(P). That's fixed. The other is the prediction gap D_KL(P||Q)."

Yuki's eyes lit up. "So minimizing cross entropy is the same as minimizing KL divergence!"

"Perfect understanding. In training, H(P) doesn't change, so reducing cross entropy equals bringing the predicted distribution closer to the true distribution."

Riku looked at his notebook. "So when my prediction problem answers were wrong, it's because cross entropy was large?"

"In a sense. Your 'predicted distribution' and 'correct distribution' were misaligned."

Yuki suddenly thought of something. "Can test failures be measured with cross entropy too?"

Aoi smiled. "Theoretically, yes. The gap between your answer's probability distribution and the correct distribution."

"Somehow, failure seems a bit more objective now."

Mira quietly raised her hand. An unusual action. "Failure is information. Learn from cross entropy."

"Yes. Failure is information. If you know how much you're off, you can see how to correct it."

Riku regained energy. "So we're studying to reduce cross entropy?"

"Good interpretation," Aoi laughed.

Yuki lifted their face from the desk. Failure isn't the end. Measure, learn, and move forward. That was the lesson of cross entropy.

Recommended Reading Assets