Last Updated: May 29, 2026
Clean data can still train a bad model.
The schema can be correct, null rates can be low, and distributions can look stable, while the dataset is still unrepresentative of the decisions the model must make. This happens with class imbalance, temporal shift, geographic gaps, demographic gaps, exposure bias, and training-serving mismatch.
This chapter is about data that is valid but not sufficient.