v2.5.0: Universal multilingual support, new decoder backbone and RoPE in attention encoders
LatestUniversal multilingual support (#238)
The whole dictionary and phoneme system is refactored. This repository now supports defining multiple dictionaries (languages) and merging some phonemes. This comes with a breaking change in the configuration to define datasets:
Old | New |
---|---|
dictionary: dictionaries/opencpop-extension.txt
raw_data_dir:
- data/xxx1/raw
- data/xxx2/raw
speakers:
- speaker1
- speaker2
spk_ids: [0, 1]
test_prefixes:
- '0:wav1'
- '0:wav2'
- '1:wav1'
- '1:wav2' |
dictionaries: # multiple languages and dictionaries
zh: dictionaries/opencpop-extension.txt
ja: dictionaries/japanese_dict_full.txt
en: dictionaries/ds_cmudict-07b.txt
extra_phonemes: []
merged_phoneme_groups:
- [zh/i, ja/i, en/iy]
- [zh/s, ja/s, en/s]
datasets: # define all raw datasets
- raw_data_dir: data/xxx1/raw # equivalent to former raw_data_dir
speaker: speaker1 # equivalent to former speakers
spk_id: 0
language: zh
test_prefixes: # similar to former test_prefixes
- wav1
- wav2
- raw_data_dir: data/xxx2/raw
speaker: speaker2
spk_id: 1
language: ja
test_prefixes:
- wav1
- wav2 |
Read the documentation for a more detailed explanation.
New decoder backbone: LYNXNet (#200, #218, #225, #228)
The new backbone shows better performance on acoustic models. The way to define the model backbone also changes:
Old | New |
---|---|
backbone_type: 'wavenet'
residual_layers: 20
residual_channels: 512
dilation_cycle_length: 4 |
# LYNXNet (default)
backbone_type: 'lynxnet'
backbone_args:
num_channels: 1024
num_layers: 6
kernel_size: 31
dropout_rate: 0.0
strong_cond: true
# WaveNet
backbone_type: 'wavenet'
backbone_args:
num_channels: 512
num_layers: 20
dilation_cycle_length: 4
|
RoPE in attention encoder (#234)
Rotary Position Embedding (RoPE) is now implemented in the FastSpeech2 attention encoders to improve their quality and save parameter count.
# encoder with RoPE
enc_ffn_kernel_size: 3
use_rope: true
# encoder without RoPE
enc_ffn_kernel_size: 9
use_rope: false
Other improvements, changes and bug fixes
- Support MiniNSF and noise injection in NSF-HiFiGAN vocoder
- Improve inference speed for old NSF module
- Missing
note_glide
is now regarded as none instead of raising errors - Add R^2 score metrics for variance paremeters on TensorBoard
- Bugfix: unexpected high CPU load during preprocessing
- Bugfix:
f0_min
andf0_max
take no effect on parselmouth pitch extractor - Bugfix: configurations are not passed correctly to pitch predictor
Some changes may not be listed above. See full change log: v2.4.0...v2.5.0