Skip to content

Multiple Changes to readmes #1510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file not shown.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
# `Intel® Modin Vs Pandas Performance` Sample
The `Intel® Modin Vs Pandas Performance` code illustrates how to use Modin* to replace the Pandas API. The sample compares the performance of Intel® Distribution of Modin* and the performance of Pandas for specific dataframe operations.
# `Intel® Modin* Vs. Pandas Performance` Sample

| Property | Description
|:--- |:---
| What you will learn | How to accelerate the Pandas API using Intel® Distribution of Modin*.
| Time to complete | Less than 10 minutes
The `Intel® Modin* Vs. Pandas Performance` code illustrates how to use Modin* to replace the Pandas API. The sample compares the performance of Intel® Distribution of Modin* and the performance of Pandas for specific dataframe operations.

| Area | Description
|:--- |:---
| What you will learn | How to accelerate the Pandas API using Intel® Distribution of Modin*.
| Time to complete | Less than 10 minutes
| Category | Concepts and Functionality

## Purpose
Intel® Distribution of Modin* accelerates Pandas operations using Ray or Dask execution engine. The distribution provides compatibility and integration with the existing Pandas code. The sample code demonstrates how to perform some basic dataframe operations using Pandas and Intel® Distribution of Modin. You will be able to compare the performance difference between the two methods.

Intel® Distribution of Modin* accelerates Pandas operations using Ray or Dask execution engine. The distribution provides compatibility and integration with the existing Pandas code. The sample code demonstrates how to perform some basic dataframe operations using Pandas and Intel® Distribution of Modin*. You will be able to compare the performance difference between the two methods.

You can run the sample locally or in Google Colaboratory (Colab).

## Prerequisites

| Optimized for | Description
|:--- |:---
| OS | Ubuntu 18.04.3 LTS
| Hardware | Intel® Xeon® CPU
| OS | Ubuntu* 20.04 (or newer)
| Hardware | Intel® Core™ Gen10 Processor <br> Intel® Xeon® Scalable Performance processors
| Software | Intel® AI Analytics Toolkit (AI Kit) <br> Intel® Distribution of Modin*

## Key Implementation Details

This code sample is implemented for CPU using Python programming language. The sample requires NumPy, Pandas, Modin libraries, and the time module in Python.

## Run the `Intel® Modin Vs Pandas Performance` Sample Locally

If you want to run the sample on a local system using a command-line interface (CLI), you must install the Intel® Distribution of Modin* in a new Conda* environment first.

### Install the Intel® Distribution of Modin*

1. Create a Conda environment.
```
conda create --name aikit-modin
Expand All @@ -51,14 +58,16 @@ If you want to run the sample on a local system using a command-line interface (
pip install ipython
```
### Run the Sample

1. Change to the directory containing the `IntelModin_Vs_Pandas.ipynb` notebook file on your local system.

2. Run the sample notebook.
```
ipython Modin_Vs_Pandas.ipynb
ipython IntelModin_Vs_Pandas.ipynb
```

## Run the `Intel® Modin Vs Pandas Performance` Sample in Google Colaboratory

1. Change to the directory containing the `IntelModin_Vs_Pandas.ipynb` notebook file on your local system.

2. Open the notebook file, and remove the prepended number sign (#) symbol from the following lines:
Expand All @@ -84,15 +93,19 @@ If you want to run the sample on a local system using a command-line interface (
9. Select **Runtime** > **Run all**.

## Example Output

>**Note**: Your output might be different between runs on the notebook depending upon the random generation of the dataset. For the first run, Modin may take longer to execute than Pandas for certain operations since Modin performs some initialization in the first iteration.

```
CPU times: user 8.47 s, sys: 132 ms, total: 8.6 s
Wall time: 8.57 s
```
Example expected cell output is included in `IntelModin_Vs_Pandas.ipynb` too.

Example expected cell output is included in `IntelModin_Vs_Pandas.ipynb`.

## License

Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.

Third party program licenses are at [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Third party program licenses are at [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,140 +1,128 @@
# `Intel Python XGBoost Getting Started` Sample
XGBoost* is a widely used gradient boosting library in the classical ML area. Designed for flexibility, performance, and portability, XGBoost* includes optimized distributed gradient boosting frameworks and implements Machine Learning algorithms underneath. Starting with 0.9 version of XGBoost, Intel has been upstreaming optimizations to the through the `hist` histogram tree-building method. Starting with 1.3.3 version of XGBoost and beyond, Intel has also begun upstreaming inference optimziations to XGBoost as well.
# `Intel® Python XGBoost* Getting Started` Sample

| Optimized for | Description
| :--- | :---
| OS | 64-bit Linux: Ubuntu 18.04 or higher, 64-bit Windows 10, macOS 10.14 or higher
| Hardware | Intel Atom® Processors; Intel® Core™ Processor Family; Intel® Xeon® Processor Family; Intel® Xeon® Scalable processor family
| Software | XGBoost, Intel® AI Analytics Toolkit (AI Kit)
| What you will learn | basic XGBoost programming model for Intel CPU
| Time to complete | 5 minutes
The `Intel® Python XGBoost* Getting Started` sample demonstrates how to set up and train an XGBoost model on datasets for prediction.

| Area | Description
| :--- | :---
| What you will learn | The basics of XGBoost programming model for Intel CPUs
| Time to complete | 5 minutes
| Category | Getting Started

## Purpose
In this code sample, you will learn how to use Intel optimizations for XGBoost published as part of Intel® AI Analytics Toolkit. The sample also illustrates how to set up and train an XGBoost* model on datasets for prediction.
It also demonstrates how to use software products that can be found in the [Intel® AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).

## Key Implementation Details
This Getting Started sample code is implemented for CPU using the Python language. The example assumes you have XGboost installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).
XGBoost* is a widely used gradient boosting library in the classical ML area. Designed for flexibility, performance, and portability, XGBoost* includes optimized distributed gradient boosting frameworks and implements Machine Learning algorithms underneath. Starting with 0.9 version of XGBoost, Intel has been up streaming optimizations through the `hist` histogram tree-building method. Starting with 1.3.3 version of XGBoost and beyond, Intel has also begun up streaming inference optimizations to XGBoost as well.

## License
Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
In this code sample, you will learn how to use Intel optimizations for XGBoost published as part of Intel® AI Analytics Toolkit. The sample also illustrates how to set up and train an XGBoost* model on datasets for prediction. It also demonstrates how to use software products that can be found in the [Intel® AI Analytics Toolkit](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).

## Prerequisites

| Optimized for | Description
| :--- | :---
| OS | Ubuntu* 20.04 (or newer)
| Hardware | Intel Atom® Processors <br> Intel® Core™ Processor Family <br> Intel® Xeon® Processor Family <br> Intel® Xeon® Scalable processor family
| Software | XGBoost* <br> Intel® AI Analytics Toolkit (AI Kit)

Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt)
## Key Implementation Details

## Building XGBoost for CPU
This Getting Started sample code is implemented for CPU using the Python language. The example assumes you have XGboost installed inside a conda environment, similar to what is delivered with the installation of the Intel® Distribution for Python* as part of the [Intel® AI Analytics Toolkit](https://software.intel.com/en-us/oneapi/ai-kit).

XGBoost* is ready for use once you finish the Intel® AI Analytics Toolkit installation and have run the post installation script.

You can refer to the oneAPI [main page](https://software.intel.com/en-us/oneapi) for toolkit installation and the Toolkit [Getting Started Guide for Linux](https://software.intel.com/en-us/get-started-with-intel-oneapi-linux-get-started-with-the-intel-ai-analytics-toolkit) for post-installation steps and scripts.
## Set Environment Variables

When working with the command-line interface (CLI), you should configure the oneAPI toolkits using environment variables. Set up your CLI environment by sourcing the `setvars` script every time you open a new terminal window. This practice ensures that your compiler, libraries, and tools are ready for development.

## Configure Environment

> **Note**: If you have not already done so, set up your CLI
> environment by sourcing the `setvars` script located in
> the root of your oneAPI installation.
>
> Linux Sudo: . /opt/intel/oneapi/setvars.sh
> environment by sourcing the `setvars` script in the root of your oneAPI installation.
>
> Linux User: . ~/intel/oneapi/setvars.sh
> Linux*:
> - For system wide installations: `. /opt/intel/oneapi/setvars.sh`
> - For private installations: ` . ~/intel/oneapi/setvars.sh`
> - For non-POSIX shells, like csh, use the following command: `bash -c 'source <install-dir>/setvars.sh ; exec csh'`
>
> Windows: C:\Program Files(x86)\Intel\oneAPI\setvars.bat
>
>For more information on environment variables, see Use the setvars Script for [Linux or macOS](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html), or [Windows](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html).
> For more information on configuring environment variables, see *[Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html)*.

### Activate conda environment With Root Access
### Using Visual Studio Code* (Optional)

However, if you activated another environment, you can return with the following command:
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
and browse and download samples.

#### On a Linux* System
```
source activate base
```
The basic steps to build and run a sample using VS Code include:
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
- Open a Terminal in VS Code (**Terminal>New Terminal**).
- Run the sample in the VS Code terminal using the instructions below.

### Activate conda environment Without Root Access (Optional)
To learn more about the extensions, see
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).

By default, the Intel® AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone your desired conda environment using the following command:
### Activate Conda with Root Access

#### On a Linux* System
If you activated another environment, you can return with the following command:
```
conda create --name user_base --clone base
source activate base
```
### Activate Conda without Root Access (Optional)

Then activate your conda environment with the following command:

By default, the Intel® AI Analytics Toolkit is installed in the inteloneapi folder, which requires root privileges to manage it. If you would like to bypass using root access to manage your conda environment, then you can clone and active your desired conda environment using the following commands:
```
conda create --name user_base --clone base
source activate user_base
```

### Install Jupyter Notebook

Launch Jupyter Notebook in the directory housing the code example
## Run the `Intel® Python XGBoost* Getting Started` Sample

```
conda install jupyter nb_conda_kernels
```

#### View in Jupyter Notebook

_Note: This distributed execution cannot be launched from the jupyter notebook version, but you can still view inside the notebook to follow the included write-up and description._

Launch Jupyter Notebook in the directory housing the code example

```
jupyter notebook
```
## Running the Sample

### Running the Sample as a Jupyter Notebook
### Install Jupyter Notebook

Open .pynb file and run cells in Jupyter Notebook using the "Run" button (see the image using "Modin Getting Started" sample)
1. Change to the sample directory.
2. Install Jupyter Notebook with an appropriate kernel.
```
conda install jupyter nb_conda_kernels
```
### Open Jupyter Notebook

![Click the Run Button in the Jupyter Notebook](Jupyter_Run.jpg "Run Button on Jupyter Notebook")
>**Note**: You cannot execute the sample in Jupyter Notebook, but you can still view inside the notebook to follow the included write-up and description.

##### Expected Printed Output for Cells (with similar numbers):
```
RMSE: 11.113036205909719
[CODE_SAMPLE_COMPLETED_SUCCESFULLY]
```
1. Change to the sample directory.
2. Launch Jupyter Notebook.
```
jupyter notebook
```
3. Locate and select the Notebook.
```
IntelPython_XGBoost_GettingStarted.ipynb
```
4. Click the **Run** button to move through the cells in sequence.

### Run the Python Script

### Running the Sample as a Python File
1. Still in Jupyter Notebook.

Open notebook in Jupyter and download as python file (see the image using "daal4py Hello World" sample)
2. Select **File** > **Download as** > **Python (py)**.
3. Run the script.
```
python IntelPython_XGBoost_GettingStarted.py
```
The output files of the script will be saved in **models** and **result** directories.

![Download as python file in the Jupyter Notebook](Jupyter_Save_Py.jpg "Download as python file in the Jupyter Notebook")
#### Troubleshooting

Run the Program
If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the [Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html) for more information on using the utility.

`python IntelPython_XGBoost_GettingStarted.py`
## Example Output

The output files of the script will be saved in the included models and result directories.
>**Note**: Your numbers might be different.

##### Expected Printed Output (with similar numbers):
```
RMSE: 11.113036205909719
[CODE_SAMPLE_COMPLETED_SUCCESFULLY]
```

### Build and run additional samples
Several sample programs are available for you to try, many of which can be compiled and run in a similar fashion. Experiment with running the various samples on different kinds of compute nodes or adjust their source code to experiment with different workloads.

### Troubleshooting
If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits.
[Learn more](https://software.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)

### Using Visual Studio Code* (Optional)

You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
and browse and download samples.

The basic steps to build and run a sample using VS Code include:
- Download a sample using the extension **Code Sample Browser for Intel oneAPI Toolkits**.
- Configure the oneAPI environment with the extension **Environment Configurator for Intel oneAPI Toolkits**.
- Open a Terminal in VS Code (**Terminal>New Terminal**).
- Run the sample in the VS Code terminal using the instructions below.
- (Linux only) Debug your GPU application with GDB for Intel® oneAPI toolkits using the Generate Launch Configurations extension.
## License

To learn more about the extensions, see
[Using Visual Studio Code with Intel® oneAPI Toolkits](https://software.intel.com/content/www/us/en/develop/documentation/using-vs-code-with-intel-oneapi/top.html).
Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.

After learning how to use the extensions for Intel oneAPI Toolkits, return to this readme for instructions on how to build and run a sample.
Third party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Binary file not shown.
Binary file not shown.
Loading