Hierarchical Classifiers
LocalClassifierPerLevel
- class LocalClassifierPerLevel.LocalClassifierPerLevel(local_classifier: BaseEstimator = None, verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Bases:
BaseEstimator,HierarchicalClassifierAssign local classifiers to each level of the hierarchy, except the root node.
A local classifier per level is a local hierarchical classifier that fits one local multi-class classifier for each level of the class hierarchy, except for the root node.
Examples
>>> from hiclass import LocalClassifierPerLevel >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcpl = LocalClassifierPerLevel() >>> lcpl.fit(X, y) >>> lcpl.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: BaseEstimator = None, verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Initialize a local classifier per level.
- Parameters:
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fitis parallelized. IfRayis installed it is used, otherwise it defaults toJoblib.calibration_method ({"ivap", "cvap", "platt", "isotonic", "beta"}, str, default=None) – If set, use the desired method to calibrate probabilities returned by predict_proba().
return_all_probabilities (bool, default=False) – If True, return probabilities for all levels. Otherwise, return only probabilities for the last level.
probability_combiner ({"geometric", "arithmetic", "multiply", None}, str, default="multiply") –
Specify the rule for combining probabilities over multiple levels:
geometric: Each levels probabilities are calculated by taking the geometric mean of itself and its predecessors;
arithmetic: Each levels probabilities are calculated by taking the arithmetic mean of itself and its predecessors;
multiply: Each levels probabilities are calculated by multiplying itself with its predecessors.
None: No aggregation.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- calibrate(X, y)
Fit a local calibrator per node.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The calibration input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
- Returns:
self – Calibrated estimator.
- Return type:
object
- fit(X, y, sample_weight=None)
Fit a local classifier per level.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self – Fitted estimator.
- Return type:
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
y – The predicted classes.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- predict_proba(X)
Predict class probabilities for the given data.
Hierarchical labels are returned. If return_all_probabilities=True: Returns the probabilities for each level. Else: Returns the probabilities for the lowest level.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
T – The predicted probabilities of the lowest levels or of all levels.
- Return type:
ndarray of shape (n_samples,n_classes) or List[ndarray(n_samples,n_classes)]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LocalClassifierPerLevel
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
LocalClassifierPerNode
- class LocalClassifierPerNode.LocalClassifierPerNode(local_classifier: BaseEstimator = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Bases:
BaseEstimator,HierarchicalClassifierAssign local classifiers to each node of the graph, except the root node.
A local classifier per node is a local hierarchical classifier that fits one local binary classifier for each node of the class hierarchy, except for the root node.
Examples
>>> from hiclass import LocalClassifierPerNode >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcpn = LocalClassifierPerNode() >>> lcpn.fit(X, y) >>> lcpn.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: BaseEstimator = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Initialize a local classifier per node.
- Parameters:
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
binary_policy ({"exclusive", "less_exclusive", "exclusive_siblings", "inclusive", "less_inclusive", "siblings"}, str, default="siblings") –
Specify the rule for defining positive and negative training examples, using one of the following options:
exclusive: Positive examples belong only to the class being considered. All classes are negative examples, except for the selected class;
less_exclusive: Positive examples belong only to the class being considered. All classes are negative examples, except for the selected class and its descendants;
exclusive_siblings: Positive examples belong only to the class being considered. All sibling classes are negative examples;
inclusive: Positive examples belong only to the class being considered and its descendants. All classes are negative examples, except for the selected class, its descendants and ancestors;
less_inclusive: Positive examples belong only to the class being considered and its descendants. All classes are negative examples, except for the selected class and its descendants;
siblings: Positive examples belong only to the class being considered and its descendants. All siblings and their descendant classes are negative examples.
See Training Policies for more information about the different policies.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fitis parallelized. IfRayis installed it is used, otherwise it defaults toJoblib.calibration_method ({"ivap", "cvap", "platt", "isotonic", "beta"}, str, default=None) – If set, use the desired method to calibrate probabilities returned by predict_proba().
return_all_probabilities (bool, default=False) – If True, return probabilities for all levels. Otherwise, return only probabilities for the last level.
probability_combiner ({"geometric", "arithmetic", "multiply", None}, str, default="multiply") –
Specify the rule for combining probabilities over multiple levels:
geometric: Each levels probabilities are calculated by taking the geometric mean of itself and its predecessors;
arithmetic: Each levels probabilities are calculated by taking the arithmetic mean of itself and its predecessors;
multiply: Each levels probabilities are calculated by multiplying itself with its predecessors.
None: No aggregation.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- calibrate(X, y)
Fit a local calibrator per node.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The calibration input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
- Returns:
self – Calibrated estimator.
- Return type:
object
- fit(X, y, sample_weight=None)
Fit a local classifier per node.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self – Fitted estimator.
- Return type:
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
y – The predicted classes.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- predict_proba(X)
Predict class probabilities for the given data.
Hierarchical labels are returned. If return_all_probabilities=True: Returns the probabilities for each level. Else: Returns the probabilities for the lowest level.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
T – The predicted probabilities of the lowest levels or of all levels.
- Return type:
ndarray of shape (n_samples,n_classes) or List[ndarray(n_samples,n_classes)]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LocalClassifierPerNode
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
LocalClassifierPerParentNode
- class LocalClassifierPerParentNode.LocalClassifierPerParentNode(local_classifier: BaseEstimator = None, verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Bases:
BaseEstimator,HierarchicalClassifierAssign local classifiers to each parent node of the graph.
A local classifier per parent node is a local hierarchical classifier that fits one multi-class classifier for each parent node of the class hierarchy.
Examples
>>> from hiclass import LocalClassifierPerParentNode >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcppn = LocalClassifierPerParentNode() >>> lcppn.fit(X, y) >>> lcppn.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: BaseEstimator = None, verbose: int = 0, edge_list: str = None, replace_classifiers: bool = True, n_jobs: int = 1, calibration_method: str = None, return_all_probabilities: bool = False, probability_combiner: str = 'multiply', tmp_dir: str = None)
Initialize a local classifier per parent node.
- Parameters:
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fitis parallelized. IfRayis installed it is used, otherwise it defaults toJoblib.calibration_method ({"ivap", "cvap", "platt", "isotonic", "beta"}, str, default=None) – If set, use the desired method to calibrate probabilities returned by predict_proba().
return_all_probabilities (bool, default=False) – If True, return probabilities for all levels. Otherwise, return only probabilities for the last level.
probability_combiner ({"geometric", "arithmetic", "multiply", None}, str, default="multiply") –
Specify the rule for combining probabilities over multiple levels:
geometric: Each levels probabilities are calculated by taking the geometric mean of itself and its predecessors;
arithmetic: Each levels probabilities are calculated by taking the arithmetic mean of itself and its predecessors;
multiply: Each levels probabilities are calculated by multiplying itself with its predecessors.
None: No aggregation.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- calibrate(X, y)
Fit a local calibrator per node.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The calibration input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
- Returns:
self – Calibrated estimator.
- Return type:
object
- fit(X, y, sample_weight=None)
Fit a local classifier per parent node.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self – Fitted estimator.
- Return type:
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
y – The predicted classes.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- predict_proba(X)
Predict class probabilities for the given data.
Hierarchical labels are returned. If return_all_probabilities=True: Returns the probabilities for each level. Else: Returns the probabilities for the lowest level.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
T – The predicted probabilities of the lowest levels or of all levels.
- Return type:
ndarray of shape (n_samples,n_classes) or List[ndarray(n_samples,n_classes)]
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LocalClassifierPerParentNode
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
Flat Classifier
- class FlatClassifier.FlatClassifier(local_classifier: BaseEstimator = LogisticRegression())
A flat classifier utility that accepts as input a hierarchy and flattens it internally.
Examples
>>> from hiclass import FlatClassifier >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> flat = FlatClassifier() >>> flat.fit(X, y) >>> flat.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: BaseEstimator = LogisticRegression())
Initialize a flat classifier.
- Parameters:
local_classifier (BaseEstimator, default=LogisticRegression) – The scikit-learn model used for the flat classification. Needs to have fit, predict and clone methods.
- fit(X, y, sample_weight=None)
Fit a flat classifier.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns:
self – Fitted estimator.
- Return type:
object
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters:
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix.- Returns:
y – The predicted classes.
- Return type:
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') FlatClassifier
Request metadata passed to the
fitmethod.Note that this method is only relevant if
enable_metadata_routing=True(seesklearn.set_config()). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter infit.- Returns:
self – The updated object.
- Return type:
object