Classifiers

Shared classes

Classifier

class Classifier.Classifier(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True)

Abstract class for the local hierarchical classifiers.

Offers mostly utility methods and common data initialization. This includes getting and setting classifiers on nodes as well as calculating and accepting the class hierarchy.

__init__(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True)

Initialize a classifier.

Parameters:
  • n_jobs (int, default=1) – The number of jobs to run in parallel. Only fit is parallelized.

  • verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.

  • local_classifier (BaseEstimator instance) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.

  • hierarchy (nx.DiGraph, default=None) – Label hierarchy used in prediction and fitting. If None, it will be inferred during training.

  • unique_taxonomy (bool, default=True) – True if the elements in the hierarchy have unique names, otherwise it can have unexpected behaviour. For example, a->b->c and d->b->e could have different meanings for b, so in that case unique_taxonomy should be set to false.

fit(X: ndarray, Y: ndarray, placeholder_label: int | str | str_ | None = None, replace_classifiers: bool = True)

Fits all classifiers.

Needs to be subclassed by other classifiers as it only offers hierarchy methods.

Parameters:
  • X (np.array of shape (n_samples, n_features)) – The training input samples.

  • Y (np.array of shape (n_samples, n_levels)) – The hierarchical labels.

  • placeholder_label (int or str, default=None) – Label that corresponds to “no label available for this data point”. Defaults will be used if not passed.

  • replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.

get_classifier(descriptor: Any) Any

Return the local classifier associated to the given descriptor.

Parameters:

descriptor (Any) – The descriptor for which the local classifier should be returned.

Returns:

classifier – The local classifier that is assigned to the given descriptor.

Return type:

Any

predict(X: ndarray, threshold: float = 0)

Predict classes for the given data.

Hierarchical labels are returned. If threshold is specified, prediction can end early.

Parameters:
  • X (np.array of shape (n_samples, n_features)) – The input samples.

  • threshold (float, default=None) – Minimum confidence score to continue prediction for children nodes.

Returns:

y – The predicted classes.

Return type:

np.array of shape (n_samples, n_levels)

set_classifier(descriptor: Any, classifier: Any) None

Set the local classifier for a given descriptor.

Parameters:
  • descriptor (Any) – The descriptor for which the local classifier should be set.

  • classifier (Any) – The local classifier that will be assigned to the given descriptor.


NodeClassifier

class Classifier.NodeClassifier(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True, policy: str | Type[Policy] = 'siblings')

Bases: Classifier, ABC

Abstract class for classifiers that have their local classifiers linked to nodes.

__init__(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True, policy: str | Type[Policy] = 'siblings')

Initialize a classifier.

Extends the superclass by adding a data policy used in fitting.

Parameters:
  • n_jobs (int, default=1) – The number of jobs to run in parallel. Only fit is parallelized.

  • verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.

  • local_classifier (BaseEstimator instance) – The local classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.

  • hierarchy (nx.DiGraph, default=None) – Label hierarchy used in prediction and fitting. If None, it will be inferred during training.

  • unique_taxonomy (bool, default=True) – True if the elements in the hierarchy have unique names, otherwise it can have unexpected behaviour. For example, a->b->c and d->b->e could have different meanings for b, so in that case unique_taxonomy should be set to false.

  • policy (Policy, default="siblings") – Rules for defining positive and negative training samples.

fit(X: ndarray, Y: ndarray, placeholder_label: int | str | str_ | None = None, replace_classifiers: bool = True) None

Fit all classifiers.

Extends superclass method and needs to be subclassed by other classifiers. Adds label flattening and policy initialization.

Parameters:
  • X (np.array of shape (n_samples, n_features)) – The training input samples.

  • Y (np.array of shape (n_samples, n_levels)) – The hierarchical labels.

  • placeholder_label (int or str, default=None) – Label that corresponds to “no label available for this data point”. Defaults will be used if not passed.

  • replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.

get_classifier(descriptor: int | str | str_) Any

Return the local classifier associated to the given label.

Parameters:

descriptor (int or str) – The label for which the local classifier should be returned.

Returns:

classifier – The local classifier that is assigned to the given label.

Return type:

Any

predict(X: ndarray, threshold: float = 0)

Predict classes for the given data.

Hierarchical labels are returned. If threshold is specified, prediction can end early.

Parameters:
  • X (np.array of shape (n_samples, n_features)) – The input samples.

  • threshold (float, default=None) – Minimum confidence score to continue prediction for children nodes.

Returns:

y – The predicted classes.

Return type:

np.array of shape (n_samples, n_levels)

set_classifier(descriptor: int | str | str_, classifier: Any) None

Set the local classifier for a given label.

Parameters:
  • descriptor (int or str) – The label for which the local classifier should be set.

  • classifier (Any) – The local classifier that will be assigned to the given label.


DuplicateFilter

class Classifier.DuplicateFilter

Bases: object

A filter that removes duplicate messages from logging.

__init__()

Initialize filter.

filter(record: Any) bool

Filter messages.

Parameters:

record (logged message) – Text to be filtered.

Returns:

rv – True if message was not found in the filter, false otherwise.

Return type:

bool


ConstantClassifier

class Classifier.ConstantClassifier(class_to_predict: int, num_classes: int)

A classifier that returns 1 for a specified label for all samples during prediction.

__init__(class_to_predict: int, num_classes: int) None

Initialize the classifier.

Parameters:
  • class_to_predict (int) – The index of the label that should be predicted with a probability of one. Needs to be within num_classes

  • num_classes (int) – The amount of labels that should be predicted by the classifier.

predict_proba(X: ndarray) ndarray

Predict X with previously set parameters.

Parameters:

X (np.ndarray of shape(n_samples, ...)) – Data that should be predicted. Only the number of samples matters.

Returns:

output – 1 for the previously set label and 0 for all others for all samples in X.

Return type:

np.ndarray


LocalClassifierPerLevel

class LocalClassifierPerLevel.LocalClassifierPerLevel(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True)

Bases: Classifier

Assign local classifiers for each class hierarchy level.

A local classifier per level is a local hierarchical classifier that fits one local multi-class classifier for each level of the hierarchy. In case of a DAG, nodes are assigned their highest possible level, with the root being the highest level.

__init__(n_jobs: int = 1, verbose: int = 0, local_classifier: Any = BaseEstimator(), hierarchy: DiGraph | None = None, unique_taxonomy: bool = True)

Initialize a classifier.

Parameters:
  • n_jobs (int, default=1) – The number of jobs to run in parallel. Only fit is parallelized.

  • verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.

  • local_classifier (BaseEstimator instance) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.

  • hierarchy (nx.DiGraph, default=None) – Label hierarchy used in prediction and fitting. If None, it will be inferred during training.

  • unique_taxonomy (bool, default=True) – True if the elements in the hierarchy have unique names, otherwise it can have unexpected behaviour. For example, a->b->c and d->b->e could have different meanings for b, so in that case unique_taxonomy should be set to false.

fit(X: ndarray, Y: ndarray, placeholder_label: int | str | str_ | None = None, replace_classifiers: bool = True) None

Fit the local classifiers.

Parameters:
  • X (np.array of shape(n_samples, n_features)) – The training input samples.

  • Y (np.array of shape (n_samples, n_levels)) – The hierarchical labels.

  • placeholder_label (int or str, default=None) – Label that corresponds to “no label available for this data point”. Defaults will be used if not passed.

  • replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.

get_classifier(descriptor: int) Any

Return the local classifier associated to the given hierarchy level.

Raise IndexError if the level is invalid.

Parameters:

descriptor (int or str) – the descriptor for which the local classifier should be returned.

Returns:

classifier – The local classifier that is assigned to the given descriptor.

Return type:

Any

predict(X: ndarray, threshold: float = 0)

Predict classes for the given data.

Hierarchical labels are returned. If threshold is specified, prediction can end early.

Parameters:
  • X (np.array of shape (n_samples, n_features)) – The input samples.

  • threshold (float, default=None) – Minimum confidence score to continue prediction for children nodes.

Returns:

y – The predicted classes.

Return type:

np.array of shape (n_samples, n_levels)

predict_proba(X: ndarray, algorithm: str | Type[DefaultProbability] = 'default') List

Compute prediction probabilities.

Parameters:
  • X (np.array of shape(n_samples, n_features)) – The input samples.

  • algorithm (str or Probability) – The algorithm to use for calculating probabilities.

Returns:

probabilities – Prediction probabilities for all classes in each hierarchical level.

Return type:

np.ndarray of shape (n_levels, n_samples)

set_classifier(descriptor: int, classifier: Any) None

Set the local classifier for a given hierarchy level.

Parameters:
  • descriptor (int or str) – the descriptor for which the local classifier should be set.

  • classifier (Any) – The local classifier that will be assigned to the given label.


LocalClassifierPerNode

class LocalClassifierPerNode.LocalClassifierPerNode(local_classifier: BaseEstimator | None = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: str | None = None, replace_classifiers: bool = True, n_jobs: int = 1)

Bases: BaseEstimator

Assign local classifiers to each node of the graph, except the root node.

A local classifier per node is a local hierarchical classifier that fits one local binary classifier for each node of the class hierarchy, except for the root node.

__init__(local_classifier: BaseEstimator | None = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: str | None = None, replace_classifiers: bool = True, n_jobs: int = 1)

Initialize a local classifier per node.

Parameters:
  • local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.

  • binary_policy (str, default="siblings") – Rules for defining positive and negative training examples.

  • verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.

  • edge_list (str, default=None) – Path to write the hierarchy built.

  • replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.

  • n_jobs (int, default=1) – The number of jobs to run in parallel. Only fit is parallelized.

fit(X, y)

Fit a local classifier per node.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csc_matrix.

  • y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.

Returns:

self – Fitted estimator.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)

Predict classes for the given data.

Hierarchical labels are returned.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.

Returns:

y – The predicted classes.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_outputs)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance


LocalClassifierPerParentNode

class LocalClassifierPerParentNode.LocalClassifierPerParentNode(local_classifier: BaseEstimator | None = None, verbose: int = 0, edge_list: str | None = None, replace_classifiers: bool = True, n_jobs: int = 1)

Bases: BaseEstimator

Assign local classifiers to each parent node of the graph.

A local classifier per parent node is a local hierarchical classifier that fits one multi-class classifier for each parent node of the class hierarchy.

__init__(local_classifier: BaseEstimator | None = None, verbose: int = 0, edge_list: str | None = None, replace_classifiers: bool = True, n_jobs: int = 1)

Initialize a local classifier per parent node.

Parameters:
  • local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.

  • verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.

  • edge_list (str, default=None) – Path to write the hierarchy built.

  • replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.

  • n_jobs (int, default=1) – The number of jobs to run in parallel. Only fit is parallelized.

fit(X, y)

Fit a local classifier per parent node.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csc_matrix.

  • y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.

Returns:

self – Fitted estimator.

Return type:

object

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

predict(X)

Predict classes for the given data.

Hierarchical labels are returned.

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.

Returns:

y – The predicted classes.

Return type:

ndarray of shape (n_samples,) or (n_samples, n_outputs)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance