Data Utilities

Binary Policies

ExclusivePolicy

class BinaryPolicy.ExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: BinaryPolicy

Implement the exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the positive ones.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


LessExclusivePolicy

class BinaryPolicy.LessExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: ExclusivePolicy

Implement the less exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the examples for the current node and its children.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


InclusivePolicy

class BinaryPolicy.InclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: BinaryPolicy

Implement the inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node, its descendants and successors.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


LessInclusivePolicy

class BinaryPolicy.LessInclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: InclusivePolicy

Implement the less inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node and its descendants.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


SiblingsPolicy

class BinaryPolicy.SiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: InclusivePolicy

Implement the siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes all examples for nodes that have the same ancestors as the given node, as well as their descendants.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


ExclusiveSiblingsPolicy

class BinaryPolicy.ExclusiveSiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray)

Bases: ExclusivePolicy

Implement the exclusive siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray)

Initialize a BinaryPolicy with the required data.

Parameters:
  • digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.

  • X (np.ndarray) – Features which will be used for fitting a model.

  • y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.

get_binary_examples(node) tuple

Gather all positive and negative examples for a given node.

Parameters:

node – Node for which the positive and negative examples should be searched.

Returns:

  • X (np.ndarray) – The subset with positive and negative features.

  • y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) ndarray

Gather all negative examples corresponding to the given node.

This includes examples for all nodes that have the same parent as the given node.

Parameters:

node – Node for which the negative examples should be searched.

Returns:

negative_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray

positive_examples(node) ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters:

node – Node for which the positive examples should be searched.

Returns:

positive_examples – A mask for which examples are included (True) and which are not.

Return type:

np.ndarray


Hierarchical Metrics

Helper functions to compute hierarchical evaluation metrics.

metrics.f1(y_true: ndarray, y_pred: ndarray)

Compute f1 score for hierarchical classification.

hF = 2 * hP * hR / (hP + hR), where hP is the hierarchical precision and hR is the hierarchical recall.

Parameters:
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns:

f1 – Weighted average of the precision and recall

Return type:

float

metrics.precision(y_true: ndarray, y_pred: ndarray)

Compute precision score for hierarchical classification.

hP = sum(len(S intersection T)) / sum(len(S)), where S is the set consisting of the most specific class(es) predicted for a test example and all respective ancestors and T is the set consisting of the true most specific class(es) for a test example and all respective ancestors.

Parameters:
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns:

precision – What proportion of positive identifications was actually correct?

Return type:

float

metrics.recall(y_true: ndarray, y_pred: ndarray)

Compute recall score for hierarchical classification.

hR = sum(len(S intersection T)) / sum(len(T)), where S is the set consisting of the most specific class(es) predicted for a test example and all respective ancestors and T is the set consisting of the true most specific class(es) for a test example and all respective ancestors.

Parameters:
  • y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.

  • y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns:

recall – What proportion of actual positives was identified correctly?

Return type:

float