Skip to content

2023.1.1 AI Kit Release #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Apr 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3a8529c
Update IntelPython_daal4py_DistributedKMeans readme (#1410)
jkinsky Mar 8, 2023
2c7e596
Adding new oneAPI sample PyTorch AMX BF16/INT8 Inference (#1401)
alexsin368 Mar 12, 2023
8fca86f
LanguageID sample updates: IPEX, BF16, script clean up (#1411)
alexsin368 Mar 12, 2023
d094492
Ai and analytics features and functionality intel python daal4py dist…
jkinsky Mar 13, 2023
8b606a6
Ai and analytics features and functionality intel python numpy numba …
jkinsky Mar 13, 2023
c457e68
Ai and analytics features and functionality intel python xg boost daa…
jkinsky Mar 24, 2023
4191749
Optimize PyTorch* Models using Intel® Extension for PyTorch* sample r…
jkinsky Mar 24, 2023
37c1ef4
Ai and analytics features and functionality intel tensor flow enablin…
jkinsky Mar 24, 2023
61bd0f7
Ai and analytics features and functionality intel tensor flow inferen…
jkinsky Mar 24, 2023
e6e1825
Intel® Python XGBoost Performance sample readme update (#1463)
jkinsky Mar 24, 2023
d4ed1d3
Intel® TensorFlow* Model Zoo Inference With FP32 Int8 readme update (…
jkinsky Mar 24, 2023
9cb2cc4
TensorFlow* Performance Analysis Using Model Zoo for Intel® Architect…
jkinsky Mar 24, 2023
9146b33
TensorFlow* Transformer with Advanced Matrix Extensions bfloat16 Mixe…
jkinsky Mar 24, 2023
15cf1a8
Intel Extension for Scikit-learn: SVC for Adult dataset readme update…
jkinsky Mar 24, 2023
368847e
Added cpuInstructionSets to sample.json for AI samples (#1481)
jimmytwei Mar 27, 2023
2861934
Intel® Neural Compressor TensorFlow* Getting Started Sample readme up…
jkinsky Mar 29, 2023
d885cd8
Language ID sample update: dataset download and fix typos (#1537)
alexsin368 Apr 10, 2023
986820c
Use HDK as backend of Modin (#1426)
huiyan2021 Apr 10, 2023
3b05b0f
Intel® AI Analytics Toolkit (AI Kit) Container Getting Started sample…
jkinsky Apr 10, 2023
6bd7a2c
improvements for Tf perf analysis (#1504)
louie-tsai Apr 10, 2023
e1a487c
Multiple Changes to readmes (#1510)
jkinsky Apr 10, 2023
09e8929
Interactive chat based on DialoGPT model using Intel® Extension for P…
krzeszew Apr 12, 2023
dfb03a0
[oneDNN] Add GPU instructions to benchDNN tutorial (#1551)
yehudaorel Apr 14, 2023
73d7ca2
Add new sample for TensorFlow AMX BF16 Inference (#1549)
YuningQiu Apr 14, 2023
be5fd49
new sample - INC Quantization with PyTorch (#1550)
devpramod-intel Apr 14, 2023
6ad2f7f
Update requirements.txt (#1558)
YuningQiu Apr 17, 2023
f1f2efb
Add .github/workflows/github-pages.yml to build oneAPI Samples app (#…
mvincerx Apr 26, 2023
9aa4bf1
Merge branch 'master' into 2023.1.1_AIKit
jimmytwei Apr 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .github/workflows/github-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: github-samples-app

on:
push:
branches:
- master

workflow_dispatch:

# schedule:
# - cron: '55 13 * * *'

jobs:
pages:
name: Build GitHub Pages
runs-on: ubuntu-latest

steps:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"

- uses: actions/checkout@v3
name: Check out app/dev # checks out app/dev in top-level dir
with:
ref: 'refs/heads/app/dev'

- uses: actions/checkout@v3
name: Check out master # checks out master in subdirectory
with:
ref: 'refs/heads/master'
path: master

- name: Build JSON DB
run: |
python3 -m pip install -r src/requirements.txt
echo master
python3 src/db.py master

- name: Remove JSON pre-prod
run: |
rm -rf src/docs/sample_db_pre.json

- name: Build Sphinx
run: |
python3 -m sphinx -W -b html src/docs/ src/docs/_build/
echo $PWD
echo ${{ github.ref }}

- name: Add GPU-Occupancy-Calculator
env:
GPU_OCC_CALC: src/docs/_build/Tools/GPU-Occupancy-Calculator/
run: |
mkdir -p ${GPU_OCC_CALC}
cp -v ${{ github.workspace }}/master/Tools/GPU-Occupancy-Calculator/index.html ${GPU_OCC_CALC}/index.html

- name: Push docs
if: ${{ github.ref == 'refs/heads/master' }} # only if this workflow is run from the master branch, push docs
env:
GITHUB_USER: ${{ github.actor }}
GITHUB_TOKEN: ${{ github.token }}
GITHUB_REPO: ${{ github.repository }}
run: |
cd src/docs/_build/
touch .nojekyll
git init
git remote add origin "https://${GITHUB_USER}:${GITHUB_TOKEN}@github.com/${GITHUB_REPO}"
git add -A
git status
git config --global user.name "GitHub Actions"
git config --global user.email "[email protected]"
git commit -sm "$(date)"
git branch -M gh-pages
git push -u origin -f gh-pages
49 changes: 26 additions & 23 deletions AI-and-Analytics/End-to-end-Workloads/Census/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,22 @@ The `Census` sample code illustrates how to use Intel® Distribution of Modin* f
## Purpose
This sample code demonstrates how to run the end-to-end census workload using the AI Toolkit without any external dependencies.

Intel® Distribution of Modin* uses Ray to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.
Intel® Distribution of Modin* uses HDK to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.

## Prerequisites

| Optimized for | Description
| :--- | :---
| OS | 64-bit Ubuntu* 18.04 or higher
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.7, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy <br> Ray
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.8 or newer, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy

The Intel® Distribution of Modin* and Intel® Extension for Scikit-learn* libraries are available together in [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).


## Key Implementation Details

This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.7 (or newer), Intel Distribution of Modin*, Ray, Intel® Extension for Scikit-Learn, and NumPy.
This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.8 (or newer), Intel Distribution of Modin*, Intel® Extension for Scikit-Learn, and NumPy.

In this sample, you will use Intel® Distribution of Modin* to ingest and process U.S. census data from 1970 to 2010 in order to build a ridge regression-based model to find the relation between education and total income earned in the US.

Expand Down Expand Up @@ -74,23 +74,29 @@ To learn more about the extensions and how to configure the oneAPI environment,

### On Linux*

1. Install the Intel® Distribution of Modin* python environment.
1. Install the Intel® Distribution of Modin* python environment (Only python 3.8 - 3.10 are supported).
```
conda create -y -n intel-aikit-modin intel-aikit-modin -c intel
conda create -n modin-hdk python=3.x -y
```
2. Activate the Conda environment.
```
conda activate intel-aikit-modin
conda activate modin-hdk
```
3. Install Jupyter Notebook.
3. Install modin-hdk, Intel® Extension for Scikit-learn* and related libraries.
```
conda install jupyter nb_conda_kernels
conda install modin-hdk -c conda-forge -y
pip install scikit-learn-intelex
pip install matplotlib
```
4. Install OpenCensus.
4. Install Jupyter Notebook
```
pip install opencensus
pip install jupyter ipykernel
```
5. Change to the sample directory, and open Jupyter Notebook.
5. Add kernel to Jupyter Notebook.
```
python -m ipykernel install --user --name modin-hdk
```
6. Change to the sample directory, and open Jupyter Notebook.
```
jupyter notebook
```
Expand Down Expand Up @@ -127,20 +133,17 @@ To learn more about the extensions and how to configure the oneAPI environment,
2. Open a web browser, and navigate to https://devcloud.intel.com. Select **Work with oneAPI**.
3. From Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started), locate the ***Connect with Jupyter* Lab*** section (near the bottom).
4. Click **Sign in to Connect** button. (If you are already signed in, the link should say ***Launch JupyterLab****.)
5. Once JupyterLab opens, select **no kernel**.
6. You might need to [clone the samples](#clone-the-samples-in-intel®-devcloud) from GitHub. If the samples are already present, skip this step.
7. Change to the sample directory.
8. Open `census_modin.ipynb`.
9. Click **Run** to run the cells.
10. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.

#### Clone the Samples in Intel® DevCloud
If the samples are not already present in your Intel® DevCloud account, download them.
1. From JupyterLab, select **File** > **New** > **Terminal**.
2. In the terminal, clone the samples from GitHub.
5. Open a terminal from Launcher
6. Follow [step 1-5](#on-linux) to create conda environment
7. Clone the samples from GitHub. If the samples are already present, skip this step.
```
git clone https://github.com/oneapi-src/oneAPI-samples.git
```
8. Change to the sample directory.
9. Open `census_modin.ipynb`.
10. Select kernel "modin-hdk"
11. Click **Run** to run the cells.
12. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.

## Example Output

Expand All @@ -152,4 +155,4 @@ This is an example Cell Output for `census_modin.ipynb` run in Jupyter Notebook.

Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.

Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
33 changes: 18 additions & 15 deletions AI-and-Analytics/End-to-end-Workloads/Census/census_modin.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"pycharm": {
Expand All @@ -29,7 +30,7 @@
},
"source": [
"In this example we will be running an end-to-end machine learning workload with US census data from 1970 to 2010.\n",
"It uses Intel® Distribution of Modin with Ray as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
"It uses Intel® Distribution of Modin with HDK (Heterogeneous Data Kernels) as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
]
},
{
Expand Down Expand Up @@ -73,14 +74,15 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Import Modin and set Ray as the compute engine. This engine uses analytical database OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
"Import Modin and set HDK as the compute engine. This engine provides a set of components for federating analytic queries to an execution backend based on OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
]
},
{
Expand All @@ -97,16 +99,7 @@
"import modin.pandas as pd\n",
"\n",
"import modin.config as cfg\n",
"from packaging import version\n",
"import modin\n",
"\n",
"cfg.IsExperimental.put(\"True\")\n",
"cfg.Engine.put('native')\n",
"# Since modin 0.12.0 OmniSci engine activation process slightly changed\n",
"if version.parse(modin.__version__) <= version.parse('0.11.3'):\n",
" cfg.Backend.put('omnisci')\n",
"else:\n",
" cfg.StorageFormat.put('omnisci')\n"
"cfg.StorageFormat.put('hdk')\n"
]
},
{
Expand Down Expand Up @@ -288,13 +281,23 @@
"mean MSE ± deviation: 0.032564569 ± 0.000041799\n",
"mean COD ± deviation: 0.995367533 ± 0.000005869"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# release resources\n",
"%reset -f"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "modin-hdk",
"language": "python",
"name": "python3"
"name": "modin-hdk"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -306,7 +309,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.11"
"version": "3.9.16"
}
},
"nbformat": 4,
Expand Down
8 changes: 5 additions & 3 deletions AI-and-Analytics/End-to-end-Workloads/Census/sample.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
"steps": [
"set -e # Terminate the script on first error",
"source $(conda info --base)/etc/profile.d/conda.sh # Bypassing conda's disability to activate environments inside a bash script: https://github.com/conda/conda/issues/7980",
"conda create -y -n intel-aikit-modin intel-aikit-modin -c intel",
"conda activate intel-aikit-modin",
"conda create -n modin-hdk python=3.9 -y",
"conda activate modin-hdk",
"conda install modin-hdk -c conda-forge -y",
"conda install -y jupyter # Installing 'jupyter' for extended abilities to execute the notebook",
"pip install opencensus # Installing 'runipy' for extended abilities to execute the notebook",
"pip install scikit-learn-intelex # Installing Intel® Extension for Scikit-learn*",
"pip install matplotlib",
"jupyter nbconvert --to notebook --execute census_modin.ipynb"
]
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/bin/bash

rm -R RIRS_NOISES
rm -R tmp
rm -R speechbrain
Expand Down
Loading