Skip to content

Commit 8dd53f7

Browse files
committed
Finish configuration schemas
1 parent 09b5a54 commit 8dd53f7

File tree

1 file changed

+43
-80
lines changed

1 file changed

+43
-80
lines changed

docs/ConfigurationSchemas.md

Lines changed: 43 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -201,16 +201,35 @@ Scale ratio of random time stretching augmentation.
201201
<tr><td align="center"><b>default</b></td><td>0.75</td>
202202
</tbody></table>
203203

204+
### backbone_args
205+
206+
Keyword arguments for the backbone of main decoder module.
207+
208+
<table><tbody>
209+
<tr><td align="center"><b>visibility</b></td><td>acoustic, variance</td>
210+
<tr><td align="center"><b>scope</b></td><td>nn</td>
211+
<tr><td align="center"><b>type</b></td><td>dict</td>
212+
</tbody></table>
213+
214+
Some available arguments are listed below.
215+
216+
| argument name | for backbone type | description |
217+
|:---------------------:|:-----------------:|:-----------------------------------------------------------------------------------------------------------:|
218+
| num_layers | wavenet/lynxnet | Number of layer blocks, or depth of the network |
219+
| num_channels | wavenet/lynxnet | Number of channels, or width of the network |
220+
| dilation_cycle_length | wavenet | Length k of the cycle $2^0, 2^1 ...., 2^k$ of convolution dilation factors through WaveNet residual blocks. |
221+
204222
### backbone_type
205223

206224
Backbone type of the main decoder/predictor module.
207225

208226
<table><tbody>
209227
<tr><td align="center"><b>visibility</b></td><td>acoustic, variance</td>
210228
<tr><td align="center"><b>scope</b></td><td>nn</td>
211-
<tr><td align="center"><b>customizability</b></td><td>reserved</td>
229+
<tr><td align="center"><b>customizability</b></td><td>normal</td>
212230
<tr><td align="center"><b>type</b></td><td>str</td>
213-
<tr><td align="center"><b>default</b></td><td>wavenet</td>
231+
<tr><td align="center"><b>default</b></td><td>lynxnet</td>
232+
<tr><td align="center"><b>constraints</b></td><td>Choose from 'wavenet', 'lynxnet'.</td>
214233
</tbody></table>
215234

216235
### base_config
@@ -418,18 +437,6 @@ The type of ODE-based generative model algorithm. The following models are curre
418437
<tr><td align="center"><b>constraints</b></td><td>Choose from 'ddpm', 'reflow'.</td>
419438
</tbody></table>
420439

421-
### dilation_cycle_length
422-
423-
Length k of the cycle $2^0, 2^1 ...., 2^k$ of convolution dilation factors through WaveNet residual blocks.
424-
425-
<table><tbody>
426-
<tr><td align="center"><b>visibility</b></td><td>acoustic</td>
427-
<tr><td align="center"><b>scope</b></td><td>nn</td>
428-
<tr><td align="center"><b>customizability</b></td><td>not recommended</td>
429-
<tr><td align="center"><b>type</b></td><td>int</td>
430-
<tr><td align="center"><b>default</b></td><td>4</td>
431-
</tbody></table>
432-
433440
### dropout
434441

435442
Dropout rate in some FastSpeech2 modules.
@@ -1273,13 +1280,21 @@ Arguments for pitch prediction.
12731280
<tr><td align="center"><b>type</b></td><td>dict</td>
12741281
</tbody></table>
12751282

1276-
### pitch_prediction_args.dilation_cycle_length
1283+
### pitch_prediction_args.backbone_args
12771284

1278-
Equivalent to [dilation_cycle_length](#dilation_cycle_length) but only for the pitch predictor model.
1285+
Equivalent to [backbone_args](#backbone_args) but only for the pitch predictor model. If not set, use the root backbone type.
12791286

12801287
<table><tbody>
12811288
<tr><td align="center"><b>visibility</b></td><td>variance</td>
1282-
<tr><td align="center"><b>default</b></td><td>5</td>
1289+
</tbody></table>
1290+
1291+
### pitch_prediction_args.backbone_type
1292+
1293+
Equivalent to [backbone_type](#backbone_type) but only for the pitch predictor model.
1294+
1295+
<table><tbody>
1296+
<tr><td align="center"><b>visibility</b></td><td>variance</td>
1297+
<tr><td align="center"><b>default</b></td><td>wavenet</td>
12831298
</tbody></table>
12841299

12851300
### pitch_prediction_args.pitd_clip_max
@@ -1340,24 +1355,6 @@ Number of repeating bins in the pitch predictor.
13401355
<tr><td align="center"><b>default</b></td><td>64</td>
13411356
</tbody></table>
13421357

1343-
### pitch_prediction_args.residual_channels
1344-
1345-
Equivalent to [residual_channels](#residual_channels) but only for the pitch predictor.
1346-
1347-
<table><tbody>
1348-
<tr><td align="center"><b>visibility</b></td><td>variance</td>
1349-
<tr><td align="center"><b>default</b></td><td>256</td>
1350-
</tbody></table>
1351-
1352-
### pitch_prediction_args.residual_layers
1353-
1354-
Equivalent to [residual_layers](#residual_layers) but only for the pitch predictor.
1355-
1356-
<table><tbody>
1357-
<tr><td align="center"><b>visibility</b></td><td>variance</td>
1358-
<tr><td align="center"><b>default</b></td><td>20</td>
1359-
</tbody></table>
1360-
13611358
### pl_trainer_accelerator
13621359

13631360
Type of Lightning trainer hardware accelerator.
@@ -1525,30 +1522,6 @@ Whether to use relative positional encoding in FastSpeech2 module.
15251522
<tr><td align="center"><b>default</b></td><td>true</td>
15261523
</tbody></table>
15271524

1528-
### residual_channels
1529-
1530-
Number of dilated convolution channels in residual blocks in WaveNet.
1531-
1532-
<table><tbody>
1533-
<tr><td align="center"><b>visibility</b></td><td>acoustic</td>
1534-
<tr><td align="center"><b>scope</b></td><td>nn</td>
1535-
<tr><td align="center"><b>customizability</b></td><td>normal</td>
1536-
<tr><td align="center"><b>type</b></td><td>int</td>
1537-
<tr><td align="center"><b>default</b></td><td>512</td>
1538-
</tbody></table>
1539-
1540-
### residual_layers
1541-
1542-
Number of residual blocks in WaveNet.
1543-
1544-
<table><tbody>
1545-
<tr><td align="center"><b>visibility</b></td><td>acoustic</td>
1546-
<tr><td align="center"><b>scope</b></td><td>nn</td>
1547-
<tr><td align="center"><b>customizability</b></td><td>normal</td>
1548-
<tr><td align="center"><b>type</b></td><td>int</td>
1549-
<tr><td align="center"><b>default</b></td><td>20</td>
1550-
</tbody></table>
1551-
15521525
### sampler_frame_count_grid
15531526

15541527
The batch sampler applies an algorithm called _sorting by similar length_ when collecting batches. Data samples are first grouped by their approximate lengths before they get shuffled within each group. Assume this value is set to $L_{grid}$, the approximate length of a data sample with length $L_{real}$ can be calculated through the following expression:
@@ -2034,43 +2007,33 @@ Arguments for prediction of variance parameters other than pitch, like energy, b
20342007
<tr><td align="center"><b>type</b></td><td>dict</td>
20352008
</tbody></table>
20362009

2037-
### variances_prediction_args.dilation_cycle_length
2010+
### variances_prediction_args.backbone_args
20382011

2039-
Equivalent to [dilation_cycle_length](#dilation_cycle_length) but only for the multi-variance predictor model.
2012+
Equivalent to [backbone_args](#backbone_args) but only for the multi-variance predictor.
20402013

20412014
<table><tbody>
20422015
<tr><td align="center"><b>visibility</b></td><td>variance</td>
2043-
<tr><td align="center"><b>default</b></td><td>4</td>
20442016
</tbody></table>
20452017

2046-
### variances_prediction_args.total_repeat_bins
2018+
### variances_prediction_args.backbone_type
20472019

2048-
Total number of repeating bins in the multi-variance predictor. Repeating bins are distributed evenly to each variance parameter.
2020+
Equivalent to [backbone_type](#backbone_type) but only for the multi-variance predictor model. If not set, use the root backbone type.
20492021

20502022
<table><tbody>
20512023
<tr><td align="center"><b>visibility</b></td><td>variance</td>
2052-
<tr><td align="center"><b>scope</b></td><td>nn, inference</td>
2053-
<tr><td align="center"><b>customizability</b></td><td>recommended</td>
2054-
<tr><td align="center"><b>type</b></td><td>int</td>
2055-
<tr><td align="center"><b>default</b></td><td>48</td>
2056-
</tbody></table>
2057-
2058-
### variances_prediction_args.residual_channels
2059-
2060-
Equivalent to [residual_channels](#residual_channels) but only for the multi-variance predictor.
2061-
2062-
<table><tbody>
2063-
<tr><td align="center"><b>visibility</b></td><td>variance</td>
2064-
<tr><td align="center"><b>default</b></td><td>192</td>
2024+
<tr><td align="center"><b>default</b></td><td>wavenet</td>
20652025
</tbody></table>
20662026

2067-
### variances_prediction_args.residual_layers
2027+
### variances_prediction_args.total_repeat_bins
20682028

2069-
Equivalent to [residual_layers](#residual_layers) but only for the multi-variance predictor.
2029+
Total number of repeating bins in the multi-variance predictor. Repeating bins are distributed evenly to each variance parameter.
20702030

20712031
<table><tbody>
20722032
<tr><td align="center"><b>visibility</b></td><td>variance</td>
2073-
<tr><td align="center"><b>default</b></td><td>10</td>
2033+
<tr><td align="center"><b>scope</b></td><td>nn, inference</td>
2034+
<tr><td align="center"><b>customizability</b></td><td>recommended</td>
2035+
<tr><td align="center"><b>type</b></td><td>int</td>
2036+
<tr><td align="center"><b>default</b></td><td>48</td>
20742037
</tbody></table>
20752038

20762039
### vocoder

0 commit comments

Comments
 (0)