.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_parallel_training.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_parallel_training.py: ===================== Parallel Training ===================== Larger datasets require more time for training. While by default the models in HiClass are trained using a single core, it is possible to train each local classifier in parallel by leveraging the library Ray [1]_. If Ray is not installed, the parallelism defaults to Joblib. In this example, we demonstrate how to train a hierarchical classifier in parallel by setting the parameter :literal:`n_jobs` to use all the cores available. Training is performed on a mock dataset from Kaggle [2]_. .. [1] https://www.ray.io/ .. [2] https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification .. GENERATED FROM PYTHON SOURCE LINES 18-51 .. raw:: html
Pipeline(steps=[('count', CountVectorizer()), ('tfidf', TfidfTransformer()),
                    ('lcppn',
                     LocalClassifierPerParentNode(local_classifier=LogisticRegression(max_iter=1000),
                                                  n_jobs=2))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. code-block:: default import sys from os import cpu_count from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from hiclass import LocalClassifierPerParentNode from hiclass.datasets import load_hierarchical_text_classification # Load train and test splits X_train, X_test, Y_train, Y_test = load_hierarchical_text_classification() # We will use logistic regression classifiers for every parent node lr = LogisticRegression(max_iter=1000) pipeline = Pipeline( [ ("count", CountVectorizer()), ("tfidf", TfidfTransformer()), ( "lcppn", LocalClassifierPerParentNode(local_classifier=lr, n_jobs=cpu_count()), ), ] ) # Fixes bug AttributeError: '_LoggingTee' object has no attribute 'fileno' # This only happens when building the documentation # Hence, you don't actually need it for your code to work sys.stdout.fileno = lambda: False # Now, let's train the local classifier per parent node pipeline.fit(X_train, Y_train) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 1 minutes 9.161 seconds) .. _sphx_glr_download_auto_examples_plot_parallel_training.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_parallel_training.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_parallel_training.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_