Skip to content

Commit e47c6ab

Browse files
[PYT-637]-pyt-1.10-new-library-releases
1 parent b52170e commit e47c6ab

File tree

2 files changed

+223
-1
lines changed

2 files changed

+223
-1
lines changed

_posts/2021-10-19-pytorch-1.10-main-release.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ A ```torch.special module```, analogous to [SciPy’s special module](https://do
3131

3232
Refer to this [tutorial](https://pytorch.org/tutorials/intermediate/parametrizations.html) and the general [documentation](https://pytorch.org/docs/master/generated/torch.nn.utils.parametrizations.spectral_norm.html?highlight=parametrize) for more details.
3333

34-
### (Beta) *CUDA Graphs APIs Integration
34+
### (Beta) CUDA Graphs APIs Integration
3535

3636
PyTorch now integrates CUDA Graphs APIs to reduce CPU overheads for CUDA workloads.
3737

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
---
2+
layout: blog_detail
3+
title: 'New Library Releases in PyTorch 1.10, including TorchX, TorchAudio, TorchVision'
4+
author: Team PyTorch
5+
---
6+
7+
Today, we are announcing a number of new features and improvements to PyTorch libraries, alongside the [PyTorch 1.10 release](https://pytorch.org/blog/pytorch-1.10-released/). Some highlights include:
8+
9+
Some highlights include:
10+
11+
* **TorchX** - a new SDK for quickly building and deploying ML applications from research & development to production.
12+
* **TorchAudio** - Added text-to-speech pipeline, self-supervised model support, multi-channel support and MVDR beamforming module, RNN transducer (RNNT) loss function, and batch and filterbank support to `lfilter` function. See the TorchAudio release notes [here](https://github.com/pytorch/audio/releases).
13+
* **TorchVision** - Added new RegNet and EfficientNet models, FX based feature extraction added to utilities, two new Automatic Augmentation techniques: Rand Augment and Trivial Augment, and updated training recipes. See the TorchVision release notes [here](https://github.com/pytorch/vision/releases).
14+
15+
16+
# Introducing TorchX
17+
18+
TorchX is a new SDK for quickly building and deploying ML applications from research & development to production. It offers various builtin components that encode MLOps best practices and make advanced features like distributed training and hyperparameter optimization accessible to all.
19+
20+
Users can get started with TorchX 0.1 with no added setup cost since it supports popular ML schedulers and pipeline orchestrators that are already widely adopted and deployed in production. No two production environments are the same. To comply with various use cases, TorchX’s core APIs allow tons of customization at well-defined extension points so that even the most unique applications can be serviced without customizing the whole vertical stack.
21+
22+
Read the [documentation](https://pytorch.org/torchx) for more details and try out this feature using this quickstart [tutorial](https://pytorch.org/torchx/latest/examples/hello_world.html).
23+
24+
25+
# TorchAudio 0.10
26+
27+
### (Stable) Text-to-speech pipeline
28+
TorchAudio now adds the Tacotron2 model and pretrained weights. It is now possible to build a text-to-speech pipeline with existing vocoder implementations like WaveRNN and Griffin-Lim. Building a TTS pipeline requires matching data processing and pretrained weights, which are often non-trivial to users. So TorchAudio introduces a bundle API so that constructing pipelines for specific pretrained weights is easy. The following example illustrates this.
29+
30+
```python
31+
import torchaudio
32+
33+
bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH
34+
35+
# Build text processor, Tacotron2 and vocoder (WaveRNN) model
36+
processor = bundle.get_text_preprocessor()
37+
tacotron2 = bundle.get_tacotron2()
38+
Downloading:
39+
100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s]
40+
vocoder = bundle.get_vocoder()
41+
Downloading:
42+
100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s]
43+
44+
text = "Hello World!"
45+
46+
# Encode text
47+
input, lengths = processor(text)
48+
49+
# Generate (mel-scale) spectrogram
50+
specgram, lengths, _ = tacotron2.infer(input, lengths)
51+
52+
# Convert spectrogram to waveform
53+
waveforms, lengths = vocoder(specgram, lengths)
54+
55+
# Save audio
56+
torchaudio.save('hello-world.wav', waveforms, vocoder.sample_rate)
57+
```
58+
59+
For the details of this API please refer to [the documentation](https://pytorch.org/audio/0.10.0/pipelines#tacotron2-text-to-speech). You can also try this from [the tutorial](https://pytorch.org/tutorials/intermediate/text_to_speech_with_torchaudio_tutorial.html).
60+
61+
### (Beta) Self-Supervised Model Support
62+
TorchAudio added HuBERT model architecture and pre-trained weight support for wav2vec 2.0 and HuBERT. HuBERT and wav2vec 2.0 are novel ways for audio representation learning and they yield high accuracy when fine-tuned on downstream tasks. These models can serve as baseline in future research, therefore, TorchAudio is providing a simple way to run the model. Similar to the TTS pipeline, the pretrained weights and associated information, such as expected sample rates and output class labels (for fine-tuned weights) are put together as a bundle, so that they can be used to build pipelines. The following example illustrates this.
63+
64+
```python
65+
import torchaudio
66+
67+
bundle = torchaudio.pipelines.HUBERT_ASR_LARGE
68+
69+
# Build the model and load pretrained weight.
70+
model = bundle.get_model()
71+
Downloading:
72+
100%|███████████████████████████████| 1.18G/1.18G [00:17<00:00, 73.8MB/s]
73+
# Check the corresponding labels of the output.
74+
labels = bundle.get_labels()
75+
print(labels)
76+
('<s>', '<pad>', '</s>', '<unk>', '|', 'E', 'T', 'A', 'O', 'N', 'I', 'H', 'S', 'R', 'D', 'L', 'U', 'M', 'W', 'C', 'F', 'G', 'Y', 'P', 'B', 'V', 'K', "'", 'X', 'J', 'Q', 'Z')
77+
78+
# Infer the label probability distribution
79+
waveform, sample_rate = torchaudio.load(hello-world.wav')
80+
81+
emissions, _ = model(waveform)
82+
83+
# Pass emission to (hypothetical) decoder
84+
transcripts = ctc_decode(emissions, labels)
85+
print(transcripts[0])
86+
HELLO WORLD
87+
```
88+
89+
Please refer to the [documentation](https://pytorch.org/audio/0.10.0/pipelines#wav2vec-2-0-hubert-representation-learning) for more details and try out this feature using [tutorial, Google Colab, or examples].
90+
91+
### (Beta) Multi-channel support and MVDR beamforming
92+
Far-field speech recognition is a more challenging task compared to near-field recognition. Multi-channel methods such as beamforming help reduce the noises and enhance the target speech.
93+
94+
TorchAudio now adds support for differentiable Minimum Variance Distortionless Response (MVDR) beamforming on multi-channel audio using Time-Frequency masks. Researchers can easily assemble it with any multi-channel ASR pipeline. There are three solutions (ref_channel, stv_evd, stv_power) and it supports single-channel and multi-channel (perform average in the method) masks. It provides an online option that recursively updates the parameters for streaming audio. We also provide a tutorial on how to apply MVDR beamforming to the multi-channel audio in the example directory.
95+
96+
```python
97+
from torchaudio.transforms import MVDR, Spectrogram, InverseSpectrogram
98+
99+
# Load the multi-channel noisy audio
100+
waveform_mix, sr = torchaudio.load('mix.wav')
101+
# Initialize the stft and istft modules
102+
stft = Spectrogram(n_fft=1024, hop_length=256, return_complex=True, power=None)
103+
istft = InverseSpectrogram(n_fft=1024, hop_length=256)
104+
# Get the noisy spectrogram
105+
specgram_mix = stft(waveform_mix)
106+
# Get the Time-Frequency mask via machine learning models
107+
mask = model(waveform)
108+
# Initialize the MVDR module
109+
mvdr = MVDR(ref_channel=0, solution=”ref_channel”, multi_mask=False)
110+
# Apply MVDR beamforming
111+
specgram_enhanced = mvdr(specgram_mix, mask)
112+
# Get the enhanced waveform via iSTFT
113+
waveform_enhanced = istft(specgram_enhanced, length=waveform.shape[-1])
114+
```
115+
116+
Please refer to the [documentation](https://pytorch.org/audio/0.10.0/transforms.html#mvdr) f for more details and try out this feature using the [MVDR tutorial](https://github.com/pytorch/audio/blob/main/examples/beamforming/MVDR_tutorial.ipynb).
117+
118+
### (Beta) RNN Transducer Loss
119+
The RNN transducer (RNNT) loss is part of the RNN transducer pipeline, which is a popular architecture for speech recognition tasks. Recently it has gotten attention for being used in a streaming setting, and has also achieved state-of-the-art WER for the LibriSpeech benchmark.
120+
121+
TorchAudio’s loss function supports float16 and float32 logits, has autograd and torchscript support, and can be run on both CPU and GPU, which has a custom CUDA kernel implementation for improved performance. The implementation is consistent with the original loss function in [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/pdf/1211.3711.pdf), but relies on code from [Alignment Restricted Streaming Recurrent Neural Network Transducer](https://arxiv.org/pdf/2011.03072.pdf). Special thanks to Jay Mahadeokar and Ching-Feng Yeh for their code contributions and guidance.
122+
123+
Please refer to the [documentation](https://pytorch.org/audio/0.10.0/transforms.html#rnntloss) for more details.
124+
125+
### (Beta) Batch support and filter bank support
126+
`torchaudio.functional.lfilter` now supports batch processing and multiple filters.
127+
128+
### (Prototype) Emformer Module
129+
Automatic speech recognition (ASR) research and productization have increasingly focused on on-device applications. Towards supporting such efforts, TorchAudio now includes [Emformer](https://arxiv.org/abs/2010.10759), a memory-efficient transformer architecture that has achieved state-of-the-art results on LibriSpeech in low-latency streaming scenarios, as a prototype feature.
130+
131+
Please refer to the [documentation](https://pytorch.org/audio/main/prototype.html#emformer) for more details.
132+
133+
### GPU Build
134+
GPU builds that support custom CUDA kernels in TorchAudio, like the one being used for RNN transducer loss, have been added. Following this change, TorchAudio’s binary distribution now includes CPU-only versions and CUDA-enabled versions. To use CUDA-enabled binaries, PyTorch also needs to be compatible with CUDA.
135+
136+
### (Prototype) RNN Transducer Loss
137+
The RNN transducer loss is used in training RNN transducer models, which is a popular architecture for speech recognition tasks. The prototype loss in torchaudio currently supports autograd, torchscript, float16 and float32, and can also be run on both CPU and CUDA. For more details, please refer to [the documentation](https://pytorch.org/audio/master/rnnt_loss.html).
138+
139+
140+
# TorchVision 0.11
141+
142+
### (Stable) New Models
143+
[RegNet](https://arxiv.org/abs/2003.13678) and [EfficientNet](https://arxiv.org/abs/1905.11946) are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and the accuracies of the pre-trained models obtained on ImageNet val can be found below (see [#4403](https://github.com/pytorch/vision/pull/4403#issuecomment-930381524), [#4530](https://github.com/pytorch/vision/pull/4530#issuecomment-933213238) and [#4293](https://github.com/pytorch/vision/pull/4293) for more details).
144+
145+
The models can be used as follows:
146+
147+
```python
148+
import torch
149+
from torchvision import models
150+
151+
x = torch.rand(1, 3, 224, 224)
152+
153+
regnet = models.regnet_y_400mf(pretrained=True)
154+
regnet.eval()
155+
predictions = regnet(x)
156+
157+
efficientnet = models.efficientnet_b0(pretrained=True)
158+
efficientnet.eval()
159+
predictions = efficientnet(x)
160+
```
161+
162+
See the full list of new models on the [torchvision.models](https://pytorch.org/vision/master/models.html) documentation page
163+
164+
We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.
165+
166+
### (Beta) FX-based Feature Extraction
167+
A new Feature Extraction method has been added to our utilities. It uses [torch.fx](https://pytorch.org/docs/stable/fx.html) and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization.
168+
169+
Here is an example of how to use the new utility:
170+
171+
```python
172+
import torch
173+
from torchvision.models import resnet50
174+
from torchvision.models.feature_extraction import create_feature_extractor
175+
176+
177+
x = torch.rand(1, 3, 224, 224)
178+
179+
model = resnet50()
180+
181+
return_nodes = {
182+
"layer4.2.relu_2": "layer4"
183+
}
184+
model2 = create_feature_extractor(model, return_nodes=return_nodes)
185+
intermediate_outputs = model2(x)
186+
187+
print(intermediate_outputs['layer4'].shape)
188+
```
189+
190+
We would like to thank Alexander Soare for developing this utility.
191+
192+
### (Stable) New Data Augmentations
193+
Two new Automatic Augmentation techniques were added: [Rand Augment](https://arxiv.org/abs/1909.13719) and [Trivial Augment](https://arxiv.org/abs/2103.10158). They apply a series of transformations on the original data to enhance them and to boost the performance of the models. The new techniques build on top of the previously added [AutoAugment](https://github.com/pytorch/vision/pull/3123) and focus on simplifying the approach, reducing the search space for the optimal policy and improving the performance gain in terms of accuracy. These techniques enable users to reproduce recipes to achieve state-of-the-art performance on the offered models. Additionally, it enables users to apply these techniques in order to do transfer learning and achieve optimal accuracy on new datasets.
194+
195+
Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:
196+
197+
```python
198+
from torchvision import transforms
199+
200+
t = transforms.RandAugment()
201+
# t = transforms.TrivialAugmentWide()
202+
transformed = t(image)
203+
204+
transform = transforms.Compose([
205+
transforms.Resize(256),
206+
transforms.RandAugment(), # transforms.TrivialAugmentWide()
207+
transforms.ToTensor()])
208+
```
209+
210+
Read the [automatic augmentation transforms](https://pytorch.org/vision/master/transforms.html#automatic-augmentation-transforms) for more details.
211+
212+
We would like to thank Samuel G. Müller for contributing to Trivial Augment and for his help on refactoring the AA package.
213+
214+
### Updated Training Recipes
215+
We have updated our training reference scripts to add support for Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, [Mixup](https://arxiv.org/abs/1710.09412), [Cutmix](https://arxiv.org/abs/1905.04899) and other [SOTA primitives](https://github.com/pytorch/vision/issues/3911). The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected in the next release.
216+
217+
218+
Thanks for reading. If you’re interested in these updates and want to join the PyTorch community, we encourage you to join [the discussion](https://discuss.pytorch.org/) forums and [open GitHub issues](https://github.com/pytorch/pytorch/issues). To get the latest news from PyTorch, follow us on [Facebook](https://www.facebook.com/pytorch/), [Twitter](https://twitter.com/PyTorch), [Medium](https://medium.com/pytorch), [YouTube](https://www.youtube.com/pytorch) or [LinkedIn](https://www.linkedin.com/company/pytorch).
219+
220+
Cheers!
221+
222+
-Team PyTorch

0 commit comments

Comments
 (0)