Data Utilities

Binary Policies


class BinaryPolicy.ExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the positive ones.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type



class BinaryPolicy.LessExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the less exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the examples for the current node and its children.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type



class BinaryPolicy.InclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node, its descendants and successors.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type



class BinaryPolicy.LessInclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the less inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the examples for the current node and its children.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type



class BinaryPolicy.SiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples for nodes that have the same ancestors as the given node, as well as their descendants.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type



class BinaryPolicy.ExclusiveSiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the exclusive siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.


node – Node for which the positive and negative examples should be searched.


  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes examples for all nodes that have the same parent as the given node.


node – Node for which the negative examples should be searched.


negative_examples – A mask for which examples are included (True) and which are not.

Return type


positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.


node – Node for which the positive examples should be searched.


positive_examples – A mask for which examples are included (True) and which are not.

Return type


Hierarchical Metrics


metrics.precision(y_true: ndarray, y_pred: ndarray, average: str = 'micro')

Compute hierarchical precision score.

  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

  • average ({"micro", "macro"}, str, default="micro") –

    This parameter determines the type of averaging performed during the computation:

    • micro: The precision is computed by summing over all individual instances, \(\displaystyle{hP = \frac{\sum_{i=1}^{n}| \alpha_i \cap \beta_i |}{\sum_{i=1}^{n}| \alpha_i |}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

    • macro: The precision is computed for each instance and then averaged, \(\displaystyle{hP = \frac{\sum_{i=1}^{n}hP_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.


precision – What proportion of positive identifications was actually correct?

Return type



metrics.recall(y_true: ndarray, y_pred: ndarray, average: str = 'micro')

Compute hierarchical recall score.

  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

  • average ({"micro", "macro"}, str, default="micro") –

    This parameter determines the type of averaging performed during the computation:

    • micro: The recall is computed by summing over all individual instances, \(\displaystyle{hR = \frac{\sum_{i=1}^{n}|\alpha_i \cap \beta_i|}{\sum_{i=1}^{n}|\beta_i|}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

    • macro: The recall is computed for each instance and then averaged, \(\displaystyle{hR = \frac{\sum_{i=1}^{n}hR_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.


recall – What proportion of actual positives was identified correctly?

Return type



metrics.f1(y_true: ndarray, y_pred: ndarray, average: str = 'micro')

Compute hierarchical f-score.

  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

  • average ({"micro", "macro"}, str, default="micro") –

    This parameter determines the type of averaging performed during the computation:

    • micro: The f-score is computed by summing over all individual instances, \(\displaystyle{hF = \frac{2 \times hP \times hR}{hP + hR}}\), where \(hP\) is the hierarchical precision and \(hR\) is the hierarchical recall.

    • macro: The f-score is computed for each instance and then averaged, \(\displaystyle{hF = \frac{\sum_{i=1}^{n}hF_{i}}{n}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors.


f1 – Weighted average of the precision and recall

Return type



Platypus diseases dataset

datasets.load_platypus(test_size=0.3, random_state=42)

Load platypus diseases dataset.

  • test_size (float, default=0.3) – The proportion of the dataset to include in the test split.

  • random_state (int or None, default=42) – Controls the randomness of the dataset. Pass an int for reproducible output across multiple function calls.


List containing train-test split of inputs.

Return type



RuntimeError – If failed to access or process the dataset.


>>> from hiclass.datasets import load_platypus
>>> X_train, X_test, Y_train, Y_test = load_platypus()
>>> X_train[:3]
     fever  diarrhea  stomach pain  skin rash  cough  sniffles  short breath  headache  size
220   37.8         0             3          5      1         1             0         2  27.6
539   37.2         0             6          1      1         1             0         3  28.4
326   39.9         0             2          5      1         1             1         2  30.7
>>> X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(572, 9) (246, 9) (572,) (246,)

Hierarchical text classification dataset

datasets.load_hierarchical_text_classification(test_size=0.3, random_state=42)

Load hierarchical text classification dataset.

  • test_size (float, default=0.3) – The proportion of the dataset to include in the test split.

  • random_state (int or None, default=42) – Controls the randomness of the dataset. Pass an int for reproducible output across multiple function calls.


List containing train-test split of inputs.

Return type



RuntimeError – If failed to access or process the dataset.


>>> from hiclass.datasets import load_hierarchical_text_classification
>>> X_train, X_test, Y_train, Y_test = load_hierarchical_text_classification()
>>> X_train[:3]
38015                                Nature's Way Selenium
2281         Music In Motion Developmental Mobile W Remote
36629    Twinings Ceylon Orange Pekoe Tea, Tea Bags, 20...
Name: Title, dtype: object
>>> X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(28000,) (12000,) (28000, 3) (12000, 3)