Data Utilities

Binary Policies

ExclusivePolicy

class BinaryPolicy.ExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the positive ones.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


LessExclusivePolicy

class BinaryPolicy.LessExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the less exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the examples for the current node and its children.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


InclusivePolicy

class BinaryPolicy.InclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node, its descendants and successors.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


LessInclusivePolicy

class BinaryPolicy.LessInclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the less inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node and its descendants.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


SiblingsPolicy

class BinaryPolicy.SiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples for nodes that have the same ancestors as the given node, as well as their descendants.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


ExclusiveSiblingsPolicy

class BinaryPolicy.ExclusiveSiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the exclusive siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

  • sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes examples for all nodes that have the same parent as the given node.

Parameters

node – Node for which the negative examples should be searched.

Returns

negative_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters

node – Node for which the positive examples should be searched.

Returns

positive_examples – A mask for which examples are included (True) and which are not.

Return type

np.ndarray


Hierarchical Metrics

Precision

metrics.precision(y_true: ndarray, y_pred: ndarray)

Compute precision score for hierarchical classification.

\(hP = \displaystyle{\frac{\sum_{i}| \alpha_i \cap \beta_i |}{\sum_{i}| \alpha_i |}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

Parameters
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

precision – What proportion of positive identifications was actually correct?

Return type

float


Recall

metrics.recall(y_true: ndarray, y_pred: ndarray)

Compute recall score for hierarchical classification.

\(\displaystyle{hR = \frac{\sum_i|\alpha_i \cap \beta_i|}{\sum_i|\beta_i|}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

Parameters
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

recall – What proportion of actual positives was identified correctly?

Return type

float


F-score

metrics.f1(y_true: ndarray, y_pred: ndarray)

Compute f1 score for hierarchical classification.

\(\displaystyle{hF = \frac{2 \times hP \times hR}{hP + hR}}\), where \(hP\) is the hierarchical precision and \(hR\) is the hierarchical recall.

Parameters
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

f1 – Weighted average of the precision and recall

Return type

float