Short Story ⟡ Informatics

Cross Entropy: Failure and Learning

Each time predictions fail, cross entropy grows. But that's the driving force of learning.

  • #cross entropy
  • #KL divergence
  • #prediction
  • #machine learning

"Wrong again."

Riku looked at his quiz results and sighed.

"Failure is a learning opportunity," Aoi consoled.

"But the same mistakes over and over..."

"That's interesting," Mira unusually spoke up. "Cross entropy perspective"

Yuki showed interest. "Cross entropy?"

Aoi opened the notebook. "H(p,q) = -Σ p(x) log q(x)"

"p is the true distribution, q is the predicted distribution."

"What does that mean?" Riku asked.

"The truth is p, and you're predicting q. The cost at that time is cross entropy."

Aoi gave an example.

"Say tomorrow's actual rain probability is 80%. But you think it's 50%."

"What happens then?" Yuki asked.

"When predictions fail, surprise is greater. Cross entropy increases."

Riku began to understand. "If my prediction is bad, cross entropy is high?"

"Exactly. H(p,q) ≥ H(p). Equality only when p = q."

"If you perfectly predict the true distribution, it's minimized."

Mira wrote an equation.

"H(p,q) = H(p) + D_KL(p||q)"

"Kullback-Leibler divergence," Aoi explained. "Like a distance between p and q."

"Distance?" Yuki asked.

"Not strictly a distance. Because it's asymmetric."

"Asymmetric?"

"D_KL(p||q) ≠ D_KL(q||p). Order matters."

Riku thought. "So my learning means bringing q closer to p?"

"Exactly," Aoi said happily. "Machine learning uses the same principle."

"Machine learning?"

"Neural networks are trained to minimize cross entropy."

Yuki wrote in the notebook. "Bring predicted distribution closer to true distribution."

"Yes. To do that, fail many times and adjust."

Riku's face showed understanding. "Each failure, cross entropy tells me."

"How wrong I am, numerically."

Mira said quietly. "Gradient descent on cross entropy"

"Gradient descent," Aoi continued. "Update parameters along the gradient of cross entropy."

"From the direction of failure toward the correct answer," Yuki summarized.

Riku laughed. "My life is also gradient descent."

"Not a bad metaphor," Aoi admitted.

"But," Yuki said worriedly, "we can also fall into local optima, right?"

"Sharp. That's gradient descent's weakness."

"Local optimum?" Riku asked.

"Best nearby, but not best overall."

Aoi drew a diagram. A curve with ups and downs.

"Once you fall here, you can't escape."

"Life might be like that too," Yuki said. "Best in current environment, but maybe better choices exist."

Mira nodded. "Exploration versus exploitation"

"Exploration-exploitation tradeoff," Aoi explained. "Optimize with current information, or search for new information?"

Riku stood up. "Then maybe I'll try exploring."

"What do you mean?"

"New study methods, new friends, new hobbies."

"High-entropy choice," Aoi laughed.

"But to escape local optima, randomness is needed."

Yuki continued. "Like simulated annealing."

"You know well," Aoi was impressed. "Temperature parameter adjusts the balance between exploration and exploitation."

Mira said finally. "Cross entropy measures surprise. Failure is information"

"Failure is information," Riku repeated. "Cross entropy teaches us."

Aoi summarized. "Each time predictions fail, there's a learning opportunity. That's the essence of learning."

"So we don't have to fear failure," Yuki said.

"Rather, not learning from failure is the problem."

Riku put his quiz in his bag. "Next time, I'll improve my predicted distribution."

"That's learning," Aoi smiled.

Sunset illuminated the club room. Failure and learning, and growth. Cross entropy was teaching them today too.