You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AI-and-Analytics/End-to-end-Workloads/Census/README.md
+26-23Lines changed: 26 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -11,22 +11,22 @@ The `Census` sample code illustrates how to use Intel® Distribution of Modin* f
11
11
## Purpose
12
12
This sample code demonstrates how to run the end-to-end census workload using the AI Toolkit without any external dependencies.
13
13
14
-
Intel® Distribution of Modin* uses Ray to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.
14
+
Intel® Distribution of Modin* uses HDK to speed up your Pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Intel® Distribution of Modin* provides integration and compatibility with existing Pandas code. Intel® Extension for Scikit-learn* dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver to get the solution faster.
15
15
16
16
## Prerequisites
17
17
18
18
| Optimized for | Description
19
19
| :--- | :---
20
20
| OS | 64-bit Ubuntu* 18.04 or higher
21
21
| Hardware | Intel Atom® processors <br> Intel® Core™ processor family <br> Intel® Xeon® processor family <br> Intel® Xeon® Scalable processor family
22
-
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.7, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy <br> Ray
22
+
| Software | Intel® AI Analytics Toolkit (AI Kit) (Python version 3.8 or newer, Intel® Distribution of Modin*) <br> Intel® Extension for Scikit-learn* <br> NumPy
23
23
24
24
The Intel® Distribution of Modin* and Intel® Extension for Scikit-learn* libraries are available together in [Intel® AI Analytics Toolkit (AI Kit)](https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html).
25
25
26
26
27
27
## Key Implementation Details
28
28
29
-
This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.7 (or newer), Intel Distribution of Modin*, Ray, Intel® Extension for Scikit-Learn, and NumPy.
29
+
This end-to-end workload sample code is implemented for CPU using the Python language. Once you have installed AI Kit, the Conda environment is prepared with Python version 3.8 (or newer), Intel Distribution of Modin*, Intel® Extension for Scikit-Learn, and NumPy.
30
30
31
31
In this sample, you will use Intel® Distribution of Modin* to ingest and process U.S. census data from 1970 to 2010 in order to build a ridge regression-based model to find the relation between education and total income earned in the US.
32
32
@@ -74,23 +74,29 @@ To learn more about the extensions and how to configure the oneAPI environment,
74
74
75
75
### On Linux*
76
76
77
-
1. Install the Intel® Distribution of Modin* python environment.
77
+
1. Install the Intel® Distribution of Modin* python environment (Only python 3.8 - 3.10 are supported).
6. Change to the sample directory, and open Jupyter Notebook.
94
100
```
95
101
jupyter notebook
96
102
```
@@ -127,20 +133,17 @@ To learn more about the extensions and how to configure the oneAPI environment,
127
133
2. Open a web browser, and navigate to https://devcloud.intel.com. Select **Work with oneAPI**.
128
134
3. From Intel® DevCloud for oneAPI [Get Started](https://devcloud.intel.com/oneapi/get_started), locate the ***Connect with Jupyter* Lab*** section (near the bottom).
129
135
4. Click **Sign in to Connect** button. (If you are already signed in, the link should say ***Launch JupyterLab****.)
130
-
5. Once JupyterLab opens, select **no kernel**.
131
-
6. You might need to [clone the samples](#clone-the-samples-in-intel®-devcloud) from GitHub. If the samples are already present, skip this step.
132
-
7. Change to the sample directory.
133
-
8. Open `census_modin.ipynb`.
134
-
9. Click **Run** to run the cells.
135
-
10. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.
136
-
137
-
#### Clone the Samples in Intel® DevCloud
138
-
If the samples are not already present in your Intel® DevCloud account, download them.
139
-
1. From JupyterLab, select **File** > **New** > **Terminal**.
140
-
2. In the terminal, clone the samples from GitHub.
136
+
5. Open a terminal from Launcher
137
+
6. Follow [step 1-5](#on-linux) to create conda environment
138
+
7. Clone the samples from GitHub. If the samples are already present, skip this step.
12. Alternatively, run the entire workbook by selecting **Restart kernel and re-run whole notebook**.
144
147
145
148
## Example Output
146
149
@@ -152,4 +155,4 @@ This is an example Cell Output for `census_modin.ipynb` run in Jupyter Notebook.
152
155
153
156
Code samples are licensed under the MIT license. See [License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
154
157
155
-
Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
158
+
Third-party program Licenses can be found here: [third-party-programs.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/third-party-programs.txt).
Copy file name to clipboardExpand all lines: AI-and-Analytics/End-to-end-Workloads/Census/census_modin.ipynb
+18-15Lines changed: 18 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,7 @@
21
21
]
22
22
},
23
23
{
24
+
"attachments": {},
24
25
"cell_type": "markdown",
25
26
"metadata": {
26
27
"pycharm": {
@@ -29,7 +30,7 @@
29
30
},
30
31
"source": [
31
32
"In this example we will be running an end-to-end machine learning workload with US census data from 1970 to 2010.\n",
32
-
"It uses Intel® Distribution of Modin with Ray as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
33
+
"It uses Intel® Distribution of Modin with HDK (Heterogeneous Data Kernels) as backend compute engine for ETL, and uses Ridge Regression algorithm from Intel scikit-learn-extension library to train and predict the co-relation between US total income and education levels."
33
34
]
34
35
},
35
36
{
@@ -73,14 +74,15 @@
73
74
]
74
75
},
75
76
{
77
+
"attachments": {},
76
78
"cell_type": "markdown",
77
79
"metadata": {
78
80
"pycharm": {
79
81
"name": "#%% md\n"
80
82
}
81
83
},
82
84
"source": [
83
-
"Import Modin and set Ray as the compute engine. This engine uses analytical database OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
85
+
"Import Modin and set HDK as the compute engine. This engine provides a set of components for federating analytic queries to an execution backend based on OmniSciDB to obtain high single-node scalability for specific set of dataframe operations. "
84
86
]
85
87
},
86
88
{
@@ -97,16 +99,7 @@
97
99
"import modin.pandas as pd\n",
98
100
"\n",
99
101
"import modin.config as cfg\n",
100
-
"from packaging import version\n",
101
-
"import modin\n",
102
-
"\n",
103
-
"cfg.IsExperimental.put(\"True\")\n",
104
-
"cfg.Engine.put('native')\n",
105
-
"# Since modin 0.12.0 OmniSci engine activation process slightly changed\n",
0 commit comments