01/2021 Journal contributions

On the appropriateness of Platt scaling in classifier calibration Björn Böken

Abstract

Many applications using data mining and machine learning techniques require posterior probability estimates besides often highly accurate predictions. Classifier calibration is a separate branch of machine learning that aims at transforming classifier predictions into posterior class probabilities and thus are useful additional extensions in the respective applications. Among the existing state-of-the- art classifier calibration techniques, Platt scaling (sometimes also called sigmoid or logistic calibration) actually is the only parametric one while almost all of its competing methods do not rely on parametric assumptions. Platt scaling is controversially discussed in the classifier calibration literature, despite good empirical results reported in many domains there are many authors criticizing it. Interestingly, none of these criticisms properly deal with the underlying parametric assumptions. Instead, even incorrect statements exist. Thus, the first contribution of this work is to review such criticism and to present a proof of the true parametric assumptions. In fact, these are more general and valid for different probability distributions, which is an immediate consequence. Next, the relationship between Platt scaling and a different, relatively new classifier calibration technique called beta calibration is analyzed and it is shown that these two are actually equivalent: Their only difference lies in the characteristics of the classifier whose predictions are calibrated. Thus, the proven validity of Platt scaling additionally translates directly into a proven optimality of beta calibration. Furthermore, evaluating classifier calibration techniques is a highly non-trivial problem as the true posteriors cannot be used as a reference. Hence, the existing evaluation metrics are reviewed as well because there exist relatively popular evaluation criteria that should not be used at all. Finally, the theoretical findings are supported by a simulation study.

Category	Journal contributions
Authors	Böken, Björn
Journal	Information Systems
Date	01/2021
Edition	95
Publisher	Elsevier
DOI	https://doi.org/10.1016/j.is.2020.101641
Keywords	Platt scaling, Classifier calibration, Posterior probability, Logistic regression, Proper scoring rules