@@ -201,16 +201,35 @@ Scale ratio of random time stretching augmentation.
201
201
<tr ><td align =" center " ><b >default</b ></td ><td >0.75</td >
202
202
</tbody ></table >
203
203
204
+ ### backbone_args
205
+
206
+ Keyword arguments for the backbone of main decoder module.
207
+
208
+ <table ><tbody >
209
+ <tr ><td align =" center " ><b >visibility</b ></td ><td >acoustic, variance</td >
210
+ <tr ><td align =" center " ><b >scope</b ></td ><td >nn</td >
211
+ <tr ><td align =" center " ><b >type</b ></td ><td >dict</td >
212
+ </tbody ></table >
213
+
214
+ Some available arguments are listed below.
215
+
216
+ | argument name | for backbone type | description |
217
+ | :---------------------:| :-----------------:| :-----------------------------------------------------------------------------------------------------------:|
218
+ | num_layers | wavenet/lynxnet | Number of layer blocks, or depth of the network |
219
+ | num_channels | wavenet/lynxnet | Number of channels, or width of the network |
220
+ | dilation_cycle_length | wavenet | Length k of the cycle $2^0, 2^1 ...., 2^k$ of convolution dilation factors through WaveNet residual blocks. |
221
+
204
222
### backbone_type
205
223
206
224
Backbone type of the main decoder/predictor module.
207
225
208
226
<table ><tbody >
209
227
<tr ><td align =" center " ><b >visibility</b ></td ><td >acoustic, variance</td >
210
228
<tr ><td align =" center " ><b >scope</b ></td ><td >nn</td >
211
- <tr ><td align =" center " ><b >customizability</b ></td ><td >reserved </td >
229
+ <tr ><td align =" center " ><b >customizability</b ></td ><td >normal </td >
212
230
<tr ><td align =" center " ><b >type</b ></td ><td >str</td >
213
- <tr ><td align =" center " ><b >default</b ></td ><td >wavenet</td >
231
+ <tr ><td align =" center " ><b >default</b ></td ><td >lynxnet</td >
232
+ <tr ><td align =" center " ><b >constraints</b ></td ><td >Choose from 'wavenet', 'lynxnet'.</td >
214
233
</tbody ></table >
215
234
216
235
### base_config
@@ -418,18 +437,6 @@ The type of ODE-based generative model algorithm. The following models are curre
418
437
<tr ><td align =" center " ><b >constraints</b ></td ><td >Choose from 'ddpm', 'reflow'.</td >
419
438
</tbody ></table >
420
439
421
- ### dilation_cycle_length
422
-
423
- Length k of the cycle $2^0, 2^1 ...., 2^k$ of convolution dilation factors through WaveNet residual blocks.
424
-
425
- <table ><tbody >
426
- <tr ><td align =" center " ><b >visibility</b ></td ><td >acoustic</td >
427
- <tr ><td align =" center " ><b >scope</b ></td ><td >nn</td >
428
- <tr ><td align =" center " ><b >customizability</b ></td ><td >not recommended</td >
429
- <tr ><td align =" center " ><b >type</b ></td ><td >int</td >
430
- <tr ><td align =" center " ><b >default</b ></td ><td >4</td >
431
- </tbody ></table >
432
-
433
440
### dropout
434
441
435
442
Dropout rate in some FastSpeech2 modules.
@@ -1273,13 +1280,21 @@ Arguments for pitch prediction.
1273
1280
<tr ><td align =" center " ><b >type</b ></td ><td >dict</td >
1274
1281
</tbody ></table >
1275
1282
1276
- ### pitch_prediction_args.dilation_cycle_length
1283
+ ### pitch_prediction_args.backbone_args
1277
1284
1278
- Equivalent to [ dilation_cycle_length ] ( #dilation_cycle_length ) but only for the pitch predictor model.
1285
+ Equivalent to [ backbone_args ] ( #backbone_args ) but only for the pitch predictor model. If not set, use the root backbone type .
1279
1286
1280
1287
<table ><tbody >
1281
1288
<tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
1282
- <tr ><td align =" center " ><b >default</b ></td ><td >5</td >
1289
+ </tbody ></table >
1290
+
1291
+ ### pitch_prediction_args.backbone_type
1292
+
1293
+ Equivalent to [ backbone_type] ( #backbone_type ) but only for the pitch predictor model.
1294
+
1295
+ <table ><tbody >
1296
+ <tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
1297
+ <tr ><td align =" center " ><b >default</b ></td ><td >wavenet</td >
1283
1298
</tbody ></table >
1284
1299
1285
1300
### pitch_prediction_args.pitd_clip_max
@@ -1340,24 +1355,6 @@ Number of repeating bins in the pitch predictor.
1340
1355
<tr ><td align =" center " ><b >default</b ></td ><td >64</td >
1341
1356
</tbody ></table >
1342
1357
1343
- ### pitch_prediction_args.residual_channels
1344
-
1345
- Equivalent to [ residual_channels] ( #residual_channels ) but only for the pitch predictor.
1346
-
1347
- <table ><tbody >
1348
- <tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
1349
- <tr ><td align =" center " ><b >default</b ></td ><td >256</td >
1350
- </tbody ></table >
1351
-
1352
- ### pitch_prediction_args.residual_layers
1353
-
1354
- Equivalent to [ residual_layers] ( #residual_layers ) but only for the pitch predictor.
1355
-
1356
- <table ><tbody >
1357
- <tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
1358
- <tr ><td align =" center " ><b >default</b ></td ><td >20</td >
1359
- </tbody ></table >
1360
-
1361
1358
### pl_trainer_accelerator
1362
1359
1363
1360
Type of Lightning trainer hardware accelerator.
@@ -1525,30 +1522,6 @@ Whether to use relative positional encoding in FastSpeech2 module.
1525
1522
<tr ><td align =" center " ><b >default</b ></td ><td >true</td >
1526
1523
</tbody ></table >
1527
1524
1528
- ### residual_channels
1529
-
1530
- Number of dilated convolution channels in residual blocks in WaveNet.
1531
-
1532
- <table ><tbody >
1533
- <tr ><td align =" center " ><b >visibility</b ></td ><td >acoustic</td >
1534
- <tr ><td align =" center " ><b >scope</b ></td ><td >nn</td >
1535
- <tr ><td align =" center " ><b >customizability</b ></td ><td >normal</td >
1536
- <tr ><td align =" center " ><b >type</b ></td ><td >int</td >
1537
- <tr ><td align =" center " ><b >default</b ></td ><td >512</td >
1538
- </tbody ></table >
1539
-
1540
- ### residual_layers
1541
-
1542
- Number of residual blocks in WaveNet.
1543
-
1544
- <table ><tbody >
1545
- <tr ><td align =" center " ><b >visibility</b ></td ><td >acoustic</td >
1546
- <tr ><td align =" center " ><b >scope</b ></td ><td >nn</td >
1547
- <tr ><td align =" center " ><b >customizability</b ></td ><td >normal</td >
1548
- <tr ><td align =" center " ><b >type</b ></td ><td >int</td >
1549
- <tr ><td align =" center " ><b >default</b ></td ><td >20</td >
1550
- </tbody ></table >
1551
-
1552
1525
### sampler_frame_count_grid
1553
1526
1554
1527
The batch sampler applies an algorithm called _ sorting by similar length_ when collecting batches. Data samples are first grouped by their approximate lengths before they get shuffled within each group. Assume this value is set to $L_ {grid}$, the approximate length of a data sample with length $L_ {real}$ can be calculated through the following expression:
@@ -2034,43 +2007,33 @@ Arguments for prediction of variance parameters other than pitch, like energy, b
2034
2007
<tr ><td align =" center " ><b >type</b ></td ><td >dict</td >
2035
2008
</tbody ></table >
2036
2009
2037
- ### variances_prediction_args.dilation_cycle_length
2010
+ ### variances_prediction_args.backbone_args
2038
2011
2039
- Equivalent to [ dilation_cycle_length ] ( #dilation_cycle_length ) but only for the multi-variance predictor model .
2012
+ Equivalent to [ backbone_args ] ( #backbone_args ) but only for the multi-variance predictor.
2040
2013
2041
2014
<table ><tbody >
2042
2015
<tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
2043
- <tr ><td align =" center " ><b >default</b ></td ><td >4</td >
2044
2016
</tbody ></table >
2045
2017
2046
- ### variances_prediction_args.total_repeat_bins
2018
+ ### variances_prediction_args.backbone_type
2047
2019
2048
- Total number of repeating bins in the multi-variance predictor. Repeating bins are distributed evenly to each variance parameter .
2020
+ Equivalent to [ backbone_type ] ( #backbone_type ) but only for the multi-variance predictor model. If not set, use the root backbone type .
2049
2021
2050
2022
<table ><tbody >
2051
2023
<tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
2052
- <tr ><td align =" center " ><b >scope</b ></td ><td >nn, inference</td >
2053
- <tr ><td align =" center " ><b >customizability</b ></td ><td >recommended</td >
2054
- <tr ><td align =" center " ><b >type</b ></td ><td >int</td >
2055
- <tr ><td align =" center " ><b >default</b ></td ><td >48</td >
2056
- </tbody ></table >
2057
-
2058
- ### variances_prediction_args.residual_channels
2059
-
2060
- Equivalent to [ residual_channels] ( #residual_channels ) but only for the multi-variance predictor.
2061
-
2062
- <table ><tbody >
2063
- <tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
2064
- <tr ><td align =" center " ><b >default</b ></td ><td >192</td >
2024
+ <tr ><td align =" center " ><b >default</b ></td ><td >wavenet</td >
2065
2025
</tbody ></table >
2066
2026
2067
- ### variances_prediction_args.residual_layers
2027
+ ### variances_prediction_args.total_repeat_bins
2068
2028
2069
- Equivalent to [ residual_layers ] ( #residual_layers ) but only for the multi-variance predictor.
2029
+ Total number of repeating bins in the multi-variance predictor. Repeating bins are distributed evenly to each variance parameter .
2070
2030
2071
2031
<table ><tbody >
2072
2032
<tr ><td align =" center " ><b >visibility</b ></td ><td >variance</td >
2073
- <tr ><td align =" center " ><b >default</b ></td ><td >10</td >
2033
+ <tr ><td align =" center " ><b >scope</b ></td ><td >nn, inference</td >
2034
+ <tr ><td align =" center " ><b >customizability</b ></td ><td >recommended</td >
2035
+ <tr ><td align =" center " ><b >type</b ></td ><td >int</td >
2036
+ <tr ><td align =" center " ><b >default</b ></td ><td >48</td >
2074
2037
</tbody ></table >
2075
2038
2076
2039
### vocoder
0 commit comments