Release v2.5.0: Universal multilingual support, new decoder backbone and RoPE in attention encoders · openvpi/DiffSinger

Universal multilingual support (#238)

The whole dictionary and phoneme system is refactored. This repository now supports defining multiple dictionaries (languages) and merging some phonemes. This comes with a breaking change in the configuration to define datasets:

Old	New
dictionary: dictionaries/opencpop-extension.txt raw_data_dir: - data/xxx1/raw - data/xxx2/raw speakers: - speaker1 - speaker2 spk_ids: [0, 1] test_prefixes: - '0:wav1' - '0:wav2' - '1:wav1' - '1:wav2'	dictionaries: # multiple languages and dictionaries zh: dictionaries/opencpop-extension.txt ja: dictionaries/japanese_dict_full.txt en: dictionaries/ds_cmudict-07b.txt extra_phonemes: [] merged_phoneme_groups: - [zh/i, ja/i, en/iy] - [zh/s, ja/s, en/s] datasets: # define all raw datasets - raw_data_dir: data/xxx1/raw # equivalent to former raw_data_dir speaker: speaker1 # equivalent to former speakers spk_id: 0 language: zh test_prefixes: # similar to former test_prefixes - wav1 - wav2 - raw_data_dir: data/xxx2/raw speaker: speaker2 spk_id: 1 language: ja test_prefixes: - wav1 - wav2

Old

New

dictionary: dictionaries/opencpop-extension.txt
raw_data_dir:
  - data/xxx1/raw
  - data/xxx2/raw
speakers:
  - speaker1
  - speaker2
spk_ids: [0, 1]
test_prefixes:
  - '0:wav1'
  - '0:wav2'
  - '1:wav1'
  - '1:wav2'

dictionaries:  # multiple languages and dictionaries
  zh: dictionaries/opencpop-extension.txt
  ja: dictionaries/japanese_dict_full.txt
  en: dictionaries/ds_cmudict-07b.txt
extra_phonemes: []
merged_phoneme_groups:
  - [zh/i, ja/i, en/iy]
  - [zh/s, ja/s, en/s]
datasets:  # define all raw datasets
  - raw_data_dir: data/xxx1/raw  # equivalent to former raw_data_dir
    speaker: speaker1  # equivalent to former speakers
    spk_id: 0
    language: zh
    test_prefixes:  # similar to former test_prefixes
      - wav1
      - wav2
  - raw_data_dir: data/xxx2/raw
    speaker: speaker2
    spk_id: 1
    language: ja
    test_prefixes:
      - wav1
      - wav2

Read the documentation for a more detailed explanation.

New decoder backbone: LYNXNet (#200, #218, #225, #228)

The new backbone shows better performance on acoustic models. The way to define the model backbone also changes:

Old	New
backbone_type: 'wavenet' residual_layers: 20 residual_channels: 512 dilation_cycle_length: 4	# LYNXNet (default) backbone_type: 'lynxnet' backbone_args: num_channels: 1024 num_layers: 6 kernel_size: 31 dropout_rate: 0.0 strong_cond: true # WaveNet backbone_type: 'wavenet' backbone_args: num_channels: 512 num_layers: 20 dilation_cycle_length: 4

Old

New

backbone_type: 'wavenet'
residual_layers: 20
residual_channels: 512
dilation_cycle_length: 4

# LYNXNet (default)
backbone_type: 'lynxnet'
backbone_args:
  num_channels: 1024
  num_layers: 6
  kernel_size: 31
  dropout_rate: 0.0
  strong_cond: true

# WaveNet
backbone_type: 'wavenet'
backbone_args:
  num_channels: 512
  num_layers: 20
  dilation_cycle_length: 4

RoPE in attention encoder (#234)

Rotary Position Embedding (RoPE) is now implemented in the FastSpeech2 attention encoders to improve their quality and save parameter count.

# encoder with RoPE
enc_ffn_kernel_size: 3
use_rope: true

# encoder without RoPE
enc_ffn_kernel_size: 9
use_rope: false

Other improvements, changes and bug fixes

Support MiniNSF and noise injection in NSF-HiFiGAN vocoder
Improve inference speed for old NSF module
Missing note_glide is now regarded as none instead of raising errors
Add R^2 score metrics for variance paremeters on TensorBoard
Bugfix: unexpected high CPU load during preprocessing
Bugfix: f0_min and f0_max take no effect on parselmouth pitch extractor
Bugfix: configurations are not passed correctly to pitch predictor

Some changes may not be listed above. See full change log: v2.4.0...v2.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.5.0: Universal multilingual support, new decoder backbone and RoPE in attention encoders

Universal multilingual support (#238)

New decoder backbone: LYNXNet (#200, #218, #225, #228)

RoPE in attention encoder (#234)

Other improvements, changes and bug fixes