Short Story ⟡ Informatics

Predictions That Don't Reach and Truths That Do

Learning about the limits of prediction models and contrasting them with the power of truth that data speaks.

  • #prediction
  • #model
  • #overfitting
  • #ground truth
  • #validation

"I predicted what score I'll get on the next test."

Riku said confidently.

"How?" Yuki asked.

"From scores on the past five tests, I calculated the trend."

Aoi was interested. "What kind of model?"

"Linear regression. They're gradually increasing, so next is 85!"

"Interesting attempt. But isn't that overfitting?"

"Overfitting?"

"Fitting too well to training data and failing on new data. Also called overfitting."

Yuki looked at her notes. "Can you predict from just five data points?"

"With little data, there's danger of learning noise."

Mira drew a diagram. A curve winding through data points.

"This is overfitting," Aoi explained. "It passes perfectly through all points, but can't predict the next one."

"Too complex models don't generalize."

Riku objected. "But mine's a simple straight line!"

"Even so, determining a trend from just five points is risky."

Aoi asked. "Those five times, what subjects were they?"

"Uh, all different. Japanese, math, English..."

"Then you can't predict," Aoi declared.

"Why not?"

"Different subjects have different distributions. No guarantee the same trend continues."

Yuki understood. "The test types are different."

"Right. Prediction models are only valid under the same conditions."

Mira wrote. "i.i.d. - independent and identically distributed."

"Data extracted independently from the same distribution. That's the assumption in machine learning," Aoi translated.

"Riku's data isn't i.i.d."

Riku looked dejected. "So I can't predict?"

"Not impossible, but accuracy is low."

Yuki encouraged him. "But I think the approach was good."

"Thanks."

Aoi continued. "Predictions and truth don't necessarily match."

"Models are just approximations of reality."

Mira wrote in her notebook. "All models are wrong, but some are useful."

"A quote from statistician Box," Aoi cited.

Yuki thought. "Then how do you make a good model?"

"Evaluate with validation data. Measure performance on data separate from training data."

"That detects overfitting."

Aoi explained. "If training error is small but validation error is large, that's overfitting."

"The model is just memorizing training data."

Riku understood. "So I should validate on the next test?"

"Right. Compare prediction with actual score."

"But," Yuki pointed out. "Once the test is over, prediction is meaningless."

"Sharp. The value of prediction is in knowing the future."

Aoi supplemented. "But validation is necessary. To verify the model's reliability."

Mira drew a new diagram. A graph of predicted versus true values.

"Ideally, they line up on the diagonal," Aoi explained.

"Prediction and truth matching."

"But in reality, they scatter."

Yuki asked. "Smaller scatter means better model?"

"Right. Evaluated by mean squared error (MSE) or coefficient of determination (R²)."

"Measuring model quality numerically."

Riku suddenly said. "But human prediction is more complex than formulas."

"True. Intuition, experience, context. Many factors that can't be quantified."

Aoi acknowledged. "That's why human judgment has aspects that can't be replaced by machines."

"But the reverse is also true. Machines are more accurate in some domains."

Mira nodded. Then wrote. "Complement each other."

"Humans and machines, leveraging each other's strengths."

Yuki summarized. "Predictions aren't perfect. But they're better than doing nothing."

"And you compare with truth and improve."

Riku laughed. "Then I'll work hard on the test. So the prediction doesn't miss."

"That's backwards causality," Aoi laughed.

"But good motivation."

Predictions don't reach. But truth does.

So people validate and keep learning.