Skip to content

Commit 46bc793

Browse files
committed
fix typo
1 parent 8933f60 commit 46bc793

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

Models/SpeechCodec/2021.07.07_SoundStream.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ To this end, one (or more) discriminators are trained jointly, with the goal of
8484
Both the encoder and the decoder only use causal convolutions, so the overall architectural latency of the model is determined solely by the temporal resampling ratio between the original time-domain waveform and the embeddings.
8585
In summary, we make the following key contributions:
8686
- We propose ***SoundStream***, a neural audio codec in which all the constituent components (encoder, decoder and quantizer) are trained end-to-end with a mix of reconstruction and adversarial losses to achieve superior audio quality.
87-
- We introduce a new residual vector quantizer, and investigate the rate-distortion-complexity trade-off simplied by its design.
87+
- We introduce a new residual vector quantizer, and investigate the rate-distortion-complexity trade-off simplified by its design.
8888
In addition, we propose a novel “quantizer dropout” technique for training the residual vector quantizer, which enables a single model to handle different bitrates.
8989
- We demonstrate that learning the encoder brings a very significant coding efficiency improvement, with respect to a solution that adopts mel-spectrogram features.
9090
- We demonstrate by means of subjective quality metrics that ***SoundStream*** outperforms both Opus and EVS over a wide range of bitrates.
@@ -589,8 +589,8 @@ Instead, decreasing the capacity of the decoder has a more significant impact on
589589
This is aligned with recent findings in the field of neural image compression [67], which also adopt a lighter encoder and a heavier decoder.
590590

591591
**Vector Quantizer Depth and Codebook Size**:
592-
The number of bits used to encode a single frame is equal to Nqlog2N, where Nq denotes the number of quantizers and N the codebook size.
593-
Hence, it is possible to achieve the same target bitrate for different combinations of Nqand N.
592+
The number of bits used to encode a single frame is equal to Nq log2N, where Nq denotes the number of quantizers and N the codebook size.
593+
Hence, it is possible to achieve the same target bitrate for different combinations of Nq and N.
594594
Table II shows three configurations, all operating at 6 kbps.
595595
As expected, using fewer vector quantizers, each with a larger codebook, achieves the highest coding efficiency at the cost of higher computational complexity.
596596
Remarkably, using a sequence of 80 1-bit quantizers leads only to a modest quality degradation.

0 commit comments

Comments
 (0)