Skip to content

Latest commit

 

History

History
 
 

IntelModin_GettingStarted

Intel® Modin* Get Started Sample

This get started sample code shows how to use distributed Pandas using the Intel® Distribution of Modin* package. It demonstrates how to use software products that can be found in the Intel® AI Analytics Toolkit (AI Kit).

Property Description
Category Get started sample
What you will learn Basic Intel® Distribution of Modin* programming model for Intel processors
Time to complete 5-8 minutes

Purpose

Intel Distribution of Modin* uses Ray or Dask to provide an effortless way to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel Distribution of Modin* provides seamless integration and compatibility with existing Pandas code.

In this sample, you will run Intel Distribution of Modin*-accelerated Pandas functions and note the performance gain when compared to "stock" (aka standard) Pandas functions.

Optimized for Description
OS 64-bit Linux: Ubuntu 18.04 or higher
Hardware Intel® Atom® processors; Intel® Core™ processor family; Intel® Xeon® processor family; Intel® Xeon® Scalable Performance processor family
Software Intel® Distribution of Modin*, Intel® AI Analytics Toolkit

Key Implementation Details

This get started sample code is implemented for CPU using the Python language. The example assumes you have Pandas and Modin installed inside a conda environment.

Environment Setup

  1. Install Intel Distribution of Modin in a new conda environment.

    Note: replace python=3.x with your own python version

    conda create -n aikit-modin python=3.x -y
    conda activate aikit-modin
    conda install modin-all -c intel -y
  2. Install matplotlib.

    conda install -c intel matplotlib -y
  3. Install Jupyter Notebook.

    Skip this step if you are working on the Intel DevCloud.

    conda install jupyter nb_conda_kernels -y
  4. Create a new kernel for Jupyter Notebook based on your activated conda environment.

    conda install ipykernel
    python -m ipykernel install --user --name usr_modin

    This step is optional if you plan to open the notebook on your local server.

Run the Sample

You can run the Jupyter notebook with the sample code on your local server or download the sample code from the notebook as a Python file and run it locally or on the Intel DevCloud. Visit Intel® Distribution of Modin Getting Started Guide for more information.

Run the Sample in Jupyter Notebook

To open the Jupyter notebook on your local server:

  1. Activate the conda environment.

    conda activate aikit-modin
  2. Start the Jupyter notebook server.

    jupyter notebook
  3. Open the IntelModin_GettingStarted.ipynb file in the Notebook Dashboard.

  4. Run the cells in the Jupyter notebook sequentially by clicking the Run button.

    Click the Run button in Jupyter Notebook

Run the Sample in the Intel® DevCloud for oneAPI JupyterLab

  1. If you do not already have an account, request an Intel® DevCloud account at Create an Intel® DevCloud Account.

  2. Open the following link in your browser: https://devcloud.intel.com/oneapi/get_started/, locate the Connect with Jupyter Lab* section (near the bottom).

  3. Click Sign in to Connect button. (If you are already signed in, the link should say Launch JupyterLab*.)

  4. If the samples are not already present in your Intel® DevCloud account, download them.

    • From JupyterLab, select File > New > Terminal.
    • In the terminal, clone the samples from GitHub:
      git clone https://github.com/oneapi-src/oneAPI-samples.git
      
  5. Setup environment in the terminal:

    • source oneAPI conda environment
      source /opt/intel/oneapi/setvars.sh --force
      
    • Refer to Environment Setup to setup environment
  6. In the JupyterLab, navigate to the IntelModin_GettingStarted.ipynb file and open it.

  7. To change the kernel, click Kernel > Change kernel > usr_modin.

  8. Run the sample code and read the explanations in the notebook.

Run the Python Script Locally

  1. Convert IntelModin_GettingStarted.ipynb to a python file in one of the following ways:

    • Open the notebook in Jupyter and download as a python file. See the image from the daal4py Hello World sample:

      Download as a python script in Jupyter Notebook

    • Run the following command to convert the notebook file to a Python script:

      jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
  2. Run the Python script.

    ipython IntelModin_GettingStarted.py

Run the Sample on the Intel® DevCloud in Batch Mode

This sample runs in batch mode, so you must have a script for batch processing.

  1. Convert IntelModin_GettingStarted.ipynb to a python file.

    jupyter nbconvert --to python IntelModin_GettingStarted.ipynb
  2. Create a shell script file run-modin-sample.sh to activate the conda environment and run the sample.

    source activate aikit-modin
    ipython IntelModin_GettingStarted.py
  3. Submit a job that requests a compute node to run the sample code.

    qsub -l nodes=1:xeon:ppn=2 -d . run-modin-sample.sh -o output.txt

    The -o output.txt option redirects the output of the script to the output.txt file.

    Click here for additional information about requesting a compute node in the Intel DevCloud.

    In order to run a script on the DevCloud, you need to request a compute node using node properties such as: gpu, xeon, fpga_compile, fpga_runtime and others. For more information about the node properties, execute the pbsnodes command.

    This node information must be provided when submitting a job to run your sample in batch mode using the qsub command. When you see the qsub command in the Run section of the Hello World instructions, change the command to fit the node you are using. Nodes which are in bold indicate they are compatible with this sample:

    Node Command
    GPU qsub -l nodes=1:gpu:ppn=2 -d . hello-world.sh
    CPU qsub -l nodes=1:xeon:ppn=2 -d . hello-world.sh
    FPGA Compile Time qsub -l nodes=1:fpga_compile:ppn=2 -d . hello-world.sh
    FPGA Runtime qsub -l nodes=1:fpga_runtime:ppn=2 -d . hello-world.sh

Run the Sample in Visual Studio Code*

You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations, and browse and download samples.

The basic steps to build and run a sample using VS Code include:

  1. Download a sample using the extension Code Sample Browser for Intel® oneAPI Toolkits.

  2. Configure the oneAPI environment with the extension Environment Configurator for Intel(R) oneAPI Toolkits.

  3. Open a Terminal in VS Code by clicking Terminal > New Terminal.

  4. Run the sample in the VS Code terminal using the instructions below.

On Linux, you can debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.

To learn more about the extensions, see Using Visual Studio Code with Intel® oneAPI Toolkits.

After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.

Expected Printed Output:

Expected cell output is shown in IntelModin_GettingStarted.ipynb.

Related Samples

Several sample programs are available for you to try, many of which can be compiled and run in a similar fashion. Experiment with running the various samples on different kinds of compute nodes or adjust their source code to experiment with different workloads.

License

Code samples are licensed under the MIT license. See License.txt for details.

Third party program Licenses can be found here: third-party-programs.txt.