Skip to content

Commit d885cd8

Browse files
authored
Language ID sample update: dataset download and fix typos (#1537)
* Add new oneAPI Sample IPEX Inference Optimization * Replacing random.randint() with random.sample() * Add support for IPEX BF16 and INT8 model option * Revert "Add support for IPEX BF16 and INT8 model option" This reverts commit 2b987db. * Adding new oneAPI sample PyTorch AMX BF16/INT8 Inference * update Features-and-Functionality README with latest changes * update Features-and-Functionality README with PT AMX BF16/INT8 Inference * README review updates * add missing * to PyTorch on README * Add data download instructions, fix typos
1 parent 2861934 commit d885cd8

File tree

3 files changed

+25
-5
lines changed

3 files changed

+25
-5
lines changed

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Inference/initialize.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ export PYTHONPATH=$PYTHONPATH:/Inference/speechbrain
1515

1616
# Install PyTorch and Intel Extension for PyTorch (IPEX)
1717
pip install torch==1.13.1 torchaudio
18-
pip intall --no-deps torchvision==0.14.0
18+
pip install --no-deps torchvision==0.14.0
1919
pip install intel_extension_for_pytorch==1.13.100
2020
pip install neural-compressor==2.0
2121

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/README.md

+23-3
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ For both training and inference, you can run the sample and scripts in Jupyter N
4545
4646
Download the CommonVoice dataset for languages of interest from [https://commonvoice.mozilla.org/en/datasets](https://commonvoice.mozilla.org/en/datasets).
4747

48-
For this sample, you will need to download the following languages: **Japanese** and **Swedish**.
48+
For this sample, you will need to download the following languages: **Japanese** and **Swedish**. Follow Steps 1-6 below or you can execute the code.
4949

5050
1. On the CommonVoice website, select the Version and Language.
5151
2. Enter your email.
@@ -59,6 +59,26 @@ For this sample, you will need to download the following languages: **Japanese**
5959

6060
The file structure **must match** the `LANGUAGE_PATHS` defined in `prepareAllCommonVoice.py` in the `Training` folder for the script to run properly.
6161

62+
These commands illustrate Steps 1-6. Notice that it downloads Japanese and Swedish from CommonVoice version 11.0.
63+
```
64+
# Create the commonVoice directory under 'data'
65+
sudo chmod 777 -R /data
66+
cd /data
67+
mkdir commonVoice
68+
cd commonVoice
69+
70+
# Download the CommonVoice data
71+
wget \
72+
https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-11.0-2022-09-21/cv-corpus-11.0-2022-09-21-ja.tar.gz \
73+
https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-11.0-2022-09-21/cv-corpus-11.0-2022-09-21-sv-SE.tar.gz
74+
75+
# Extract and organize the CommonVoice data into respective folders by language
76+
tar -xf cv-corpus-11.0-2022-09-21-ja.tar.gz
77+
mv cv-corpus-11.0-2022-09-21 japanese
78+
tar -xf cv-corpus-11.0-2022-09-21-sv-SE.tar.gz
79+
mv cv-corpus-11.0-2022-09-21 swedish
80+
```
81+
6282
### Configuring the Container
6383

6484
1. Pull the `oneapi-aikit` docker image.
@@ -95,7 +115,7 @@ This section explains how to train a model for language identification using the
95115
```
96116
2. Launch Jupyter Notebook.
97117
```
98-
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root &
118+
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
99119
```
100120
3. Follow the instructions to open the URL with the token in your browser.
101121
4. Locate and select the Training Notebook.
@@ -225,7 +245,7 @@ To run inference, you must have already run all of the training scripts, generat
225245
```
226246
2. Launch Jupyter Notebook.
227247
```
228-
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root &
248+
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
229249
```
230250
3. Follow the instructions to open the URL with the token in your browser.
231251
4. Locate and select the inference Notebook.

AI-and-Analytics/End-to-end-Workloads/LanguageIdentification/Training/initialize.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ pip install webdataset==0.1.96
1818

1919
# Install PyTorch and Intel Extension for PyTorch (IPEX)
2020
pip install torch==1.13.1 torchaudio
21-
pip intall --no-deps torchvision==0.14.0
21+
pip install --no-deps torchvision==0.14.0
2222
pip install intel_extension_for_pytorch==1.13.100
2323

2424
# Install libraries for MP3 to WAV conversion

0 commit comments

Comments
 (0)