Hierarchical Classifiers
LocalClassifierPerLevel
- class LocalClassifierPerLevel.LocalClassifierPerLevel(local_classifier: Optional[BaseEstimator] = None, verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Bases:
BaseEstimator
,HierarchicalClassifier
Assign local classifiers to each level of the hierarchy, except the root node.
A local classifier per level is a local hierarchical classifier that fits one local multi-class classifier for each level of the class hierarchy, except for the root node.
Examples
>>> from hiclass import LocalClassifierPerLevel >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcpl = LocalClassifierPerLevel() >>> lcpl.fit(X, y) >>> lcpl.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: Optional[BaseEstimator] = None, verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Initialize a local classifier per level.
- Parameters
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fit
is parallelized. IfRay
is installed it is used, otherwise it defaults toJoblib
.bert (bool, default=False) – If True, skip scikit-learn’s checks and sample_weight passing for BERT.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- fit(X, y, sample_weight=None)
Fit a local classifier per level.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix
.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns
self – Fitted estimator.
- Return type
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
routing – A
MetadataRequest
encapsulating routing information.- Return type
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.- Returns
y – The predicted classes.
- Return type
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- set_fit_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') LocalClassifierPerLevel
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.- Returns
self – The updated object.
- Return type
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
LocalClassifierPerNode
- class LocalClassifierPerNode.LocalClassifierPerNode(local_classifier: Optional[BaseEstimator] = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Bases:
BaseEstimator
,HierarchicalClassifier
Assign local classifiers to each node of the graph, except the root node.
A local classifier per node is a local hierarchical classifier that fits one local binary classifier for each node of the class hierarchy, except for the root node.
Examples
>>> from hiclass import LocalClassifierPerNode >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcpn = LocalClassifierPerNode() >>> lcpn.fit(X, y) >>> lcpn.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: Optional[BaseEstimator] = None, binary_policy: str = 'siblings', verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Initialize a local classifier per node.
- Parameters
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
binary_policy ({"exclusive", "less_exclusive", "exclusive_siblings", "inclusive", "less_inclusive", "siblings"}, str, default="siblings") –
Specify the rule for defining positive and negative training examples, using one of the following options:
exclusive: Positive examples belong only to the class being considered. All classes are negative examples, except for the selected class;
less_exclusive: Positive examples belong only to the class being considered. All classes are negative examples, except for the selected class and its descendants;
exclusive_siblings: Positive examples belong only to the class being considered. All sibling classes are negative examples;
inclusive: Positive examples belong only to the class being considered and its descendants. All classes are negative examples, except for the selected class, its descendants and ancestors;
less_inclusive: Positive examples belong only to the class being considered and its descendants. All classes are negative examples, except for the selected class and its descendants;
siblings: Positive examples belong only to the class being considered and its descendants. All siblings and their descendant classes are negative examples.
See Training Policies for more information about the different policies.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fit
is parallelized. IfRay
is installed it is used, otherwise it defaults toJoblib
.bert (bool, default=False) – If True, skip scikit-learn’s checks and sample_weight passing for BERT.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- fit(X, y, sample_weight=None)
Fit a local classifier per node.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix
.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns
self – Fitted estimator.
- Return type
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
routing – A
MetadataRequest
encapsulating routing information.- Return type
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.- Returns
y – The predicted classes.
- Return type
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- set_fit_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') LocalClassifierPerNode
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.- Returns
self – The updated object.
- Return type
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
LocalClassifierPerParentNode
- class LocalClassifierPerParentNode.LocalClassifierPerParentNode(local_classifier: Optional[BaseEstimator] = None, verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Bases:
BaseEstimator
,HierarchicalClassifier
Assign local classifiers to each parent node of the graph.
A local classifier per parent node is a local hierarchical classifier that fits one multi-class classifier for each parent node of the class hierarchy.
Examples
>>> from hiclass import LocalClassifierPerParentNode >>> y = [['1', '1.1'], ['2', '2.1']] >>> X = [[1, 2], [3, 4]] >>> lcppn = LocalClassifierPerParentNode() >>> lcppn.fit(X, y) >>> lcppn.predict(X) array([['1', '1.1'], ['2', '2.1']])
- __init__(local_classifier: Optional[BaseEstimator] = None, verbose: int = 0, edge_list: Optional[str] = None, replace_classifiers: bool = True, n_jobs: int = 1, bert: bool = False, tmp_dir: Optional[str] = None)
Initialize a local classifier per parent node.
- Parameters
local_classifier (BaseEstimator, default=LogisticRegression) – The local_classifier used to create the collection of local classifiers. Needs to have fit, predict and clone methods.
verbose (int, default=0) – Controls the verbosity when fitting and predicting. See https://verboselogs.readthedocs.io/en/latest/readme.html#overview-of-logging-levels for more information.
edge_list (str, default=None) – Path to write the hierarchy built.
replace_classifiers (bool, default=True) – Turns on (True) the replacement of a local classifier with a constant classifier when trained on only a single unique class.
n_jobs (int, default=1) – The number of jobs to run in parallel. Only
fit
is parallelized. IfRay
is installed it is used, otherwise it defaults toJoblib
.bert (bool, default=False) – If True, skip scikit-learn’s checks and sample_weight passing for BERT.
tmp_dir (str, default=None) – Temporary directory to persist local classifiers that are trained. If the job needs to be restarted, it will skip the pre-trained local classifier found in the temporary directory.
- fit(X, y, sample_weight=None)
Fit a local classifier per parent node.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsc_matrix
.y (array-like of shape (n_samples, n_levels)) – The target values, i.e., hierarchical class labels for classification.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- Returns
self – Fitted estimator.
- Return type
object
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns
routing – A
MetadataRequest
encapsulating routing information.- Return type
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- predict(X)
Predict classes for the given data.
Hierarchical labels are returned.
- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.- Returns
y – The predicted classes.
- Return type
ndarray of shape (n_samples,) or (n_samples, n_outputs)
- set_fit_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') LocalClassifierPerParentNode
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.- Returns
self – The updated object.
- Return type
object
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance