Classifier Calibration

HiClass provides support for probability calibration using various post-hoc calibration methods.

Motivation

While many machine learning models can output uncertainty scores, these scores are known to be often poorly calibrated [1] [2]. Model calibration aims to improve the quality of probabilistic forecasts by learning a transformation of the scores, using a separate dataset.

Methods

HiClass supports the following calibration methods:

  • Isotonic Regression [3]

  • Platt Scaling [4]

  • Beta Calibration [5]

  • Inductive Venn-Abers Calibration [6]

  • Cross Venn-Abers Calibration [6]

Probability Aggregation

Combining probabilities over multiple levels is another method to improve probabilistic forecasts. The following methods are supported:

Conditional Probability Aggregation (Multiply Aggregation)

Given a node hierarchy with \(n\) levels, the probability of a node \(A_i\), where \(i\) denotes the level, is calculated as:

\(\displaystyle{\mathbb{P}(A_1 \cap A_2 \cap \ldots \cap A_i) = \mathbb{P}(A_1) \cdot \mathbb{P}(A_2 \mid A_1) \cdot \mathbb{P}(A_3 \mid A_1 \cap A_2) \cdot \ldots}\) \(\displaystyle{\cdot \mathbb{P}(A_i \mid A_1 \cap A_2 \cap \ldots \cap A_{i-1})}\)

Arithmetic Mean Aggregation

\(\displaystyle{\mathbb{P}(A_i) = \frac{1}{i} \sum_{j=1}^{i} \mathbb{P}(A_{j})}\)

Geometric Mean Aggregation

\(\displaystyle{\mathbb{P}(A_i) = \exp{\left(\frac{1}{i} \sum_{j=1}^{i} \ln \mathbb{P}(A_{j})\right)}}\)

Code sample

from sklearn.ensemble import RandomForestClassifier

from hiclass import LocalClassifierPerNode

# Define data
X_train = [[1], [2], [3], [4]]
X_test = [[4], [3], [2], [1]]
X_cal = [[5], [6], [7], [8]]
Y_train = [
    ["Animal", "Mammal", "Sheep"],
    ["Animal", "Mammal", "Cow"],
    ["Animal", "Reptile", "Snake"],
    ["Animal", "Reptile", "Lizard"],
]

Y_cal = [
    ["Animal", "Mammal", "Cow"],
    ["Animal", "Mammal", "Sheep"],
    ["Animal", "Reptile", "Lizard"],
    ["Animal", "Reptile", "Snake"],
]

# Use random forest classifiers for every node
rf = RandomForestClassifier()

# Use local classifier per node with isotonic regression as calibration method
classifier = LocalClassifierPerNode(
    local_classifier=rf, calibration_method="isotonic", probability_combiner="multiply"
)

# Train local classifier per node
classifier.fit(X_train, Y_train)

# Calibrate local classifier per node
classifier.calibrate(X_cal, Y_cal)

# Predict probabilities
probabilities = classifier.predict_proba(X_test)

# Print probabilities and labels for the last level
print(classifier.classes_[2])
print(probabilities)