Building Pipelines

HiClass can be adopted in scikit-learn pipelines, and fully supports sparse matrices as input. This example desmonstrates the use of both of these features.

/home/docs/checkouts/readthedocs.org/user_builds/hiclass/envs/v5.0.1/lib/python3.12/site-packages/sklearn/base.py:474: FutureWarning: `BaseEstimator._validate_data` is deprecated in 1.6 and will be removed in 1.7. Use `sklearn.utils.validation.validate_data` instead. This function becomes public and is part of the scikit-learn developer API.
  warnings.warn(
[['Credit reporting' 'Reports']
 ['Loan' 'Student loan']]

from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

from hiclass import LocalClassifierPerParentNode

# Define data
X_train = [
    "Struggling to repay loan",
    "Unable to get annual report",
]
X_test = [
    "Unable to get annual report",
    "Struggling to repay loan",
]
Y_train = [["Loan", "Student loan"], ["Credit reporting", "Reports"]]

# We will use logistic regression classifiers for every parent node
lr = LogisticRegression()

# Let's build a pipeline using CountVectorizer and TfidfTransformer
# to extract features as sparse matrices
pipeline = Pipeline(
    [
        ("count", CountVectorizer()),
        ("tfidf", TfidfTransformer()),
        ("lcppn", LocalClassifierPerParentNode(local_classifier=lr)),
    ]
)

# Now, let's train a local classifier per parent node
pipeline.fit(X_train, Y_train)

# Finally, let's predict using the pipeline
predictions = pipeline.predict(X_test)
print(predictions)

Total running time of the script: (0 minutes 0.012 seconds)

Gallery generated by Sphinx-Gallery