Data Utilities
Binary Policies
ExclusivePolicy
- class BinaryPolicy.ExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
BinaryPolicy
Implement the exclusive policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes all examples except the positive ones.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This only includes examples for the given node.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
LessExclusivePolicy
- class BinaryPolicy.LessExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
ExclusivePolicy
Implement the less exclusive policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes all examples except the examples for the current node and its children.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This only includes examples for the given node.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
InclusivePolicy
- class BinaryPolicy.InclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
BinaryPolicy
Implement the inclusive policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes all examples, except the examples for the given node, its descendants and successors.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This includes examples for the given node and its descendants.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
LessInclusivePolicy
- class BinaryPolicy.LessInclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
InclusivePolicy
Implement the less inclusive policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes all examples except the examples for the current node and its children.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This includes examples for the given node and its descendants.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
SiblingsPolicy
- class BinaryPolicy.SiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
InclusivePolicy
Implement the siblings policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes all examples for nodes that have the same ancestors as the given node, as well as their descendants.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This includes examples for the given node and its descendants.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
ExclusiveSiblingsPolicy
- class BinaryPolicy.ExclusiveSiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Bases:
ExclusivePolicy
Implement the exclusive siblings policy of the referenced paper.
- __init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)
Initialize a BinaryPolicy with the required data.
- Parameters
digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
- get_binary_examples(node) tuple
Gather all positive and negative examples for a given node.
- Parameters
node – Node for which the positive and negative examples should be searched.
- Returns
X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.
- negative_examples(node) ndarray
Gather all negative examples corresponding to the given node.
This includes examples for all nodes that have the same parent as the given node.
- Parameters
node – Node for which the negative examples should be searched.
- Returns
negative_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
- positive_examples(node) ndarray
Gather all positive examples corresponding to the given node.
This only includes examples for the given node.
- Parameters
node – Node for which the positive examples should be searched.
- Returns
positive_examples – A mask for which examples are included (True) and which are not.
- Return type
np.ndarray
Hierarchical Metrics
Precision
- metrics.precision(y_true: ndarray, y_pred: ndarray, average: str = 'micro')
Compute hierarchical precision score.
- Parameters
y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.
average ({"micro", "macro"}, str, default="micro") –
This parameter determines the type of averaging performed during the computation:
micro: The precision is computed by summing over all individual instances, \(\displaystyle{hP = \frac{\sum_{i=1}^{n}| \alpha_i \cap \beta_i |}{\sum_{i=1}^{n}| \alpha_i |}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.
macro: The precision is computed for each instance and then averaged, \(\displaystyle{hP = \frac{\sum_{i=1}^{n}hP_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.
- Returns
precision – What proportion of positive identifications was actually correct?
- Return type
float
Recall
- metrics.recall(y_true: ndarray, y_pred: ndarray, average: str = 'micro')
Compute hierarchical recall score.
- Parameters
y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.
average ({"micro", "macro"}, str, default="micro") –
This parameter determines the type of averaging performed during the computation:
micro: The recall is computed by summing over all individual instances, \(\displaystyle{hR = \frac{\sum_{i=1}^{n}|\alpha_i \cap \beta_i|}{\sum_{i=1}^{n}|\beta_i|}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.
macro: The recall is computed for each instance and then averaged, \(\displaystyle{hR = \frac{\sum_{i=1}^{n}hR_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.
- Returns
recall – What proportion of actual positives was identified correctly?
- Return type
float
F-score
- metrics.f1(y_true: ndarray, y_pred: ndarray, average: str = 'micro')
Compute hierarchical f-score.
- Parameters
y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.
average ({"micro", "macro"}, str, default="micro") –
This parameter determines the type of averaging performed during the computation:
micro: The f-score is computed by summing over all individual instances, \(\displaystyle{hF = \frac{2 \times hP \times hR}{hP + hR}}\), where \(hP\) is the hierarchical precision and \(hR\) is the hierarchical recall.
macro: The f-score is computed for each instance and then averaged, \(\displaystyle{hF = \frac{\sum_{i=1}^{n}hF_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.
- Returns
f1 – Weighted average of the precision and recall
- Return type
float
Datasets
Platypus diseases dataset
- datasets.load_platypus(test_size=0.3, random_state=42)
Load platypus diseases dataset.
- Parameters
test_size (float, default=0.3) – The proportion of the dataset to include in the test split.
random_state (int or None, default=42) – Controls the randomness of the dataset. Pass an int for reproducible output across multiple function calls.
- Returns
List containing train-test split of inputs.
- Return type
list
- Raises
RuntimeError – If failed to access or process the dataset.
Examples
>>> from hiclass.datasets import load_platypus >>> X_train, X_test, Y_train, Y_test = load_platypus() >>> X_train[:3] fever diarrhea stomach pain skin rash cough sniffles short breath headache size 220 37.8 0 3 5 1 1 0 2 27.6 539 37.2 0 6 1 1 1 0 3 28.4 326 39.9 0 2 5 1 1 1 2 30.7 >>> X_train.shape, X_test.shape, Y_train.shape, Y_test.shape (572, 9) (246, 9) (572,) (246,)
Hierarchical text classification dataset
- datasets.load_hierarchical_text_classification(test_size=0.3, random_state=42)
Load hierarchical text classification dataset.
- Parameters
test_size (float, default=0.3) – The proportion of the dataset to include in the test split.
random_state (int or None, default=42) – Controls the randomness of the dataset. Pass an int for reproducible output across multiple function calls.
- Returns
List containing train-test split of inputs.
- Return type
list
- Raises
RuntimeError – If failed to access or process the dataset.
Examples
>>> from hiclass.datasets import load_hierarchical_text_classification >>> X_train, X_test, Y_train, Y_test = load_hierarchical_text_classification() >>> X_train[:3] 38015 Nature's Way Selenium 2281 Music In Motion Developmental Mobile W Remote 36629 Twinings Ceylon Orange Pekoe Tea, Tea Bags, 20... Name: Title, dtype: object >>> X_train.shape, X_test.shape, Y_train.shape, Y_test.shape (28000,) (12000,) (28000, 3) (12000, 3)