Data Utilities

Binary Policies

ExclusivePolicy

class BinaryPolicy.ExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the positive ones.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

LessExclusivePolicy

class BinaryPolicy.LessExclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the less exclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes all examples except the examples for the current node and its children.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

InclusivePolicy

class BinaryPolicy.InclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: BinaryPolicy

Implement the inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node, its descendants and successors.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

LessInclusivePolicy

class BinaryPolicy.LessInclusivePolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the less inclusive policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes all examples, except the examples for the given node and its descendants.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

SiblingsPolicy

class BinaryPolicy.SiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: InclusivePolicy

Implement the siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes all examples for nodes that have the same ancestors as the given node, as well as their descendants.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This includes examples for the given node and its descendants.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

ExclusiveSiblingsPolicy

class BinaryPolicy.ExclusiveSiblingsPolicy(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Bases: ExclusivePolicy

Implement the exclusive siblings policy of the referenced paper.

__init__(digraph: DiGraph, X: ndarray, y: ndarray, sample_weight=None)

Initialize a BinaryPolicy with the required data.

Parameters

digraph (nx.DiGraph) – DiGraph which is used for inferring nodes relationships.
X (np.ndarray) – Features which will be used for fitting a model.
y (np.ndarray) – Labels which will be assigned to the different samples. Has to be 2D array.
sample_weight (array-like of shape (n_samples,), default=None) – Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

get_binary_examples(node) → tuple

Gather all positive and negative examples for a given node.

Parameters

node – Node for which the positive and negative examples should be searched.

Returns

X (np.ndarray) – The subset with positive and negative features.
y (np.ndarray) – The subset with positive and negative labels.

negative_examples(node) → ndarray

Gather all negative examples corresponding to the given node.

This includes examples for all nodes that have the same parent as the given node.

Parameters: node – Node for which the negative examples should be searched.
Returns: negative_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

positive_examples(node) → ndarray

Gather all positive examples corresponding to the given node.

This only includes examples for the given node.

Parameters: node – Node for which the positive examples should be searched.
Returns: positive_examples – A mask for which examples are included (True) and which are not.
Return type: np.ndarray

Hierarchical Metrics

Precision

metrics.precision(y_true: ndarray, y_pred: ndarray)

Compute precision score for hierarchical classification.

\(hP = \displaystyle{\frac{\sum_{i}| \alpha_i \cap \beta_i |}{\sum_{i}| \alpha_i |}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

Parameters

y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

precision – What proportion of positive identifications was actually correct?

Return type

float

Recall

metrics.recall(y_true: ndarray, y_pred: ndarray)

Compute recall score for hierarchical classification.

\(\displaystyle{hR = \frac{\sum_i|\alpha_i \cap \beta_i|}{\sum_i|\beta_i|}}\), where \(\alpha_i\) is the set consisting of the most specific classes predicted for test example \(i\) and all their ancestor classes, while \(\beta_i\) is the set containing the true most specific classes of test example \(i\) and all their ancestors, with summations computed over all test examples.

Parameters

y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

recall – What proportion of actual positives was identified correctly?

Return type

float

F-score

metrics.f1(y_true: ndarray, y_pred: ndarray)

Compute f1 score for hierarchical classification.

\(\displaystyle{hF = \frac{2 \times hP \times hR}{hP + hR}}\), where \(hP\) is the hierarchical precision and \(hR\) is the hierarchical recall.

Parameters

y_true (np.array of shape (n_samples, n_levels)) – Ground truth (correct) labels.
y_pred (np.array of shape (n_samples, n_levels)) – Predicted labels, as returned by a classifier.

Returns

f1 – Weighted average of the precision and recall

Return type

float