|
1 |
| -# 标题 |
| 1 | +# Everyone-Can-Sing |
2 | 2 |
|
3 | 3 | <details>
|
4 | 4 | <summary>基本信息</summary>
|
|
11 | 11 | - 04 Zeyu Jin
|
12 | 12 | - 链接:
|
13 | 13 | - [ArXiv](https://arxiv.org/abs/2501.13870)
|
14 |
| - - [Publication]() |
15 |
| - - [Github]() |
| 14 | + - [Publication] |
| 15 | + - [Github] |
16 | 16 | - [Demo](http://everyone-can-sing.github.io/)
|
17 | 17 | - 文件:
|
18 | 18 | - [ArXiv](_PDF/2501.13870v1__Everyone-Can-Sing__Zero-Shot_Singing_Voice_Synthesis_and_Conversion_with_Speech_Reference.pdf)
|
|
22 | 22 |
|
23 | 23 | ## Abstract: 摘要
|
24 | 24 |
|
| 25 | +<details> |
| 26 | +<summary>展开原文</summary> |
| 27 | + |
25 | 28 | We propose a unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data.
|
26 | 29 | Our framework enables control over multiple aspects, including language content based on lyrics, performance attributes based on a musical score, singing style and vocal techniques based on a selector, and voice identity based on a speech sample.
|
27 | 30 | The proposed zero-shot learning paradigm consists of one SVS model and two SVC models, utilizing pre-trained content embeddings and a diffusion-based generator.
|
28 | 31 | The proposed framework is also trained on mixed datasets comprising both singing and speech audio, allowing singing voice cloning based on speech reference.
|
29 | 32 | Experiments show substantial improvements in timbre similarity and musicality over state-of-the-art baselines, providing insights into other low-data music tasks such as instrumental style transfer.
|
30 | 33 | Examples can be found at: [this http URL](http://everyone-can-sing.github.io/).
|
31 | 34 |
|
32 |
| -## 1·Introduction: 引言 |
| 35 | +</details> |
| 36 | +<br> |
| 37 | + |
| 38 | +我们提出了一个统一框架, 用于歌声合成 (Singing Voice Synthesis, SVS) 和歌声转换 (Singing Voice Conversion, SVC), 以解决现有方法在跨域歌声合成/转换中的局限性, 输出音乐性差, 和歌唱数据稀缺性. |
| 39 | +我们的框架实现多个方面的控制, 包括基于歌词的语言内容, 基于乐谱的表演属性, 基于选择器的歌唱风格和声乐技巧, 基于语音样本的声音身份. |
| 40 | +所提出的零样本学习范式由一个 SVS 模型和两个 SVC 模型组成, 利用预训练的内容嵌入和基于扩散模型的生成器. |
| 41 | +所提出的框架也在由歌声和语音音频混合的数据集上训练, 以允许基于语音参考进行歌声克隆. |
| 42 | +实验表明在音色相似性和音乐性方面相比于现有的基线有显著的提升, 为其他数据少的音乐任务 (如乐器风格转换) 提供了见解. |
| 43 | +示例可以在[此链接](http://everyone-can-sing.github.io/)找到. |
33 | 44 |
|
| 45 | +## 1·Introduction: 引言 |
34 | 46 |
|
35 | 47 | Singing voice synthesis (SVS), which generates singing voice signals from music scores, is gaining increasing importance in generative AI and benefiting various applications in music production and entertainment.
|
36 | 48 | Recent advances in deep-learning-based audio synthesis, such as acoustic models ([FastSpeech2](../Acoustic/2020.06.08_FastSpeech2.md)[^1]), neural vocoders ([HiFi-GAN](../Vocoder/2020.10.12_HiFi-GAN.md)[^2]; [BigVGAN](../Vocoder/2022.06.09_BigVGAN.md)[^3]), and tokenizer-based codec models ([DAC](../SpeechCodec/2023.06.11_Descript-Audio-Codec.md)[^4]; [SoundStream](../SpeechCodec/2021.07.07_SoundStream.md)[^5]), have greatly improved models' ability to reproduce singing voices from training data [^6] [^7] [^8].
|
|
0 commit comments