Data Utilities¶
Policies¶
ExclusivePolicy¶
LessExclusivePolicy¶
InclusivePolicy¶
LessInclusivePolicy¶
SiblingsPolicy¶
ExclusiveSiblingsPolicy¶
Utility functions¶
Helper functions for data manipulation and graph creation.
- data.convert_categorical_labels_to_numeric(data: ndarray, placeholder_label: str = '') Tuple[ndarray, dict]¶
Take a string array and convert the values to enumerated integers.
- Parameters:
data (np.array) – Data to be processed.
placeholder_label (str) – label representing no present class.
- Returns:
conversion (np.array) – The converted integer array.
mapping (dict) – Mapping from the original value to the new enumeration.
- data.find_max_depth(graph: DiGraph, root: int | str | str_) int¶
Find the maximum depth of a DAG given a node as root.
- Parameters:
graph (nx.DiGraph) – Graph for which the root should be found.
root (int or str) – Node from which the distance should be measured.
- Returns:
max_depth – The number of nodes in the furthest path starting from the root node. A graph that only has the root would return 1.
- Return type:
int
- data.find_root(graph: DiGraph) int | str | str_¶
Take a graph and return one of its root nodes.
- Parameters:
graph (nx.DiGraph) – Graph for which the root should be found.
- Returns:
root – The label of the root node.
- Return type:
int or str
- data.flatten_labels(labels: ndarray, placeholder_label: int = -1) ndarray¶
Flattens hierarchical labels to only the most specific label per entry.
Expects the most specific label to be on the rightmost side of the label array for every entry.
- Parameters:
labels (np.array) – 2d label matrix, formatted to be row, column.
placeholder_label (int or str) – value describing an undefined label in order to support uneven hierarchies and labels.
- Returns:
new_labels – 1d array of the most specific label for each row.
- Return type:
np.array
- data.graph_from_edge_pairs(file: str, delimiter: str = ',', skip_header: int = 1) DiGraph¶
Create a DAG from a file containing (parent, child) pairs.
- Parameters:
file (str) – File containing the edge pairs.
delimiter (str, default=',') – The delimiter of the file.
skip_header (int, default=1) – The number of rows header rows that should be skipped.
- Returns:
graph – The graph with all corresponding edges and their nodes.
- Return type:
nx.DiGraph
- data.graph_from_hierarchical_labels(data: ndarray, placeholder: str | int | None = None) DiGraph¶
Construct a DAG from hierarchical labels.
In the case that multiple root nodes are found, a new root node is inserted and all previous root nodes are connected to it.
- Parameters:
data (np.array) – Hierarchical labels, formatted to be (row, col). The columns should be ordered from least specific to most specific class. If some columns are invalid (i.e. there are columns with a number of labels lower than the number of columns), then they should be marked by a placeholder.
placeholder (str or int, default=None) – Value for non-existent nodes in the data. Has to match data type of
data
- Returns:
graph – The graph with all corresponding edges and their nodes.
- Return type:
DiGraph
- data.is_numeric_label(array: ndarray) bool¶
Determine whether an array has a numerical label format.
Supported formats are booleans, unsigned integers, signed integers and floats.
- Parameters:
array (np.array) – The array to check.
- Returns:
result – True if the array is has a supported format, False otherwise
- Return type:
bool
- data.minimal_graph_depth(graph: DiGraph) int¶
Calculate the minimal depth in which all nodes can be hit.
- Parameters:
graph (nx.DiGraph) – Graph to be analyzed.
- Returns:
depth – The minimal depth.
- Return type:
int
- data.minimal_per_node_depth(graph: DiGraph) dict¶
Calculate the minimal depth which is needed to hit a node, for all nodes.
- Parameters:
graph (nx.DiGraph) – Graph to be analyzed.
- Returns:
node_depth – A mapping for node : depth
- Return type:
dict