Data Utilities

Policies

ExclusivePolicy


LessExclusivePolicy


InclusivePolicy


LessInclusivePolicy


SiblingsPolicy


ExclusiveSiblingsPolicy


Utility functions

Helper functions for data manipulation and graph creation.

data.convert_categorical_labels_to_numeric(data: ndarray, placeholder_label: str = '') Tuple[ndarray, dict]

Take a string array and convert the values to enumerated integers.

Parameters:
  • data (np.array) – Data to be processed.

  • placeholder_label (str) – label representing no present class.

Returns:

  • conversion (np.array) – The converted integer array.

  • mapping (dict) – Mapping from the original value to the new enumeration.

data.find_max_depth(graph: DiGraph, root: int | str | str_) int

Find the maximum depth of a DAG given a node as root.

Parameters:
  • graph (nx.DiGraph) – Graph for which the root should be found.

  • root (int or str) – Node from which the distance should be measured.

Returns:

max_depth – The number of nodes in the furthest path starting from the root node. A graph that only has the root would return 1.

Return type:

int

data.find_root(graph: DiGraph) int | str | str_

Take a graph and return one of its root nodes.

Parameters:

graph (nx.DiGraph) – Graph for which the root should be found.

Returns:

root – The label of the root node.

Return type:

int or str

data.flatten_labels(labels: ndarray, placeholder_label: int = -1) ndarray

Flattens hierarchical labels to only the most specific label per entry.

Expects the most specific label to be on the rightmost side of the label array for every entry.

Parameters:
  • labels (np.array) – 2d label matrix, formatted to be row, column.

  • placeholder_label (int or str) – value describing an undefined label in order to support uneven hierarchies and labels.

Returns:

new_labels – 1d array of the most specific label for each row.

Return type:

np.array

data.graph_from_edge_pairs(file: str, delimiter: str = ',', skip_header: int = 1) DiGraph

Create a DAG from a file containing (parent, child) pairs.

Parameters:
  • file (str) – File containing the edge pairs.

  • delimiter (str, default=',') – The delimiter of the file.

  • skip_header (int, default=1) – The number of rows header rows that should be skipped.

Returns:

graph – The graph with all corresponding edges and their nodes.

Return type:

nx.DiGraph

data.graph_from_hierarchical_labels(data: ndarray, placeholder: str | int | None = None) DiGraph

Construct a DAG from hierarchical labels.

In the case that multiple root nodes are found, a new root node is inserted and all previous root nodes are connected to it.

Parameters:
  • data (np.array) – Hierarchical labels, formatted to be (row, col). The columns should be ordered from least specific to most specific class. If some columns are invalid (i.e. there are columns with a number of labels lower than the number of columns), then they should be marked by a placeholder.

  • placeholder (str or int, default=None) – Value for non-existent nodes in the data. Has to match data type of data

Returns:

graph – The graph with all corresponding edges and their nodes.

Return type:

DiGraph

data.is_numeric_label(array: ndarray) bool

Determine whether an array has a numerical label format.

Supported formats are booleans, unsigned integers, signed integers and floats.

Parameters:

array (np.array) – The array to check.

Returns:

result – True if the array is has a supported format, False otherwise

Return type:

bool

data.minimal_graph_depth(graph: DiGraph) int

Calculate the minimal depth in which all nodes can be hit.

Parameters:

graph (nx.DiGraph) – Graph to be analyzed.

Returns:

depth – The minimal depth.

Return type:

int

data.minimal_per_node_depth(graph: DiGraph) dict

Calculate the minimal depth which is needed to hit a node, for all nodes.

Parameters:

graph (nx.DiGraph) – Graph to be analyzed.

Returns:

node_depth – A mapping for node : depth

Return type:

dict