fix typo

SapphireLab · SapphireLab · commit 46bc7938f493 · 2024-12-31T16:30:50.000+08:00
diff --git a/Models/SpeechCodec/2021.07.07_SoundStream.md b/Models/SpeechCodec/2021.07.07_SoundStream.md
@@ -84,7 +84,7 @@ To this end, one (or more) discriminators are trained jointly, with the goal of
 Both the encoder and the decoder only use causal convolutions, so the overall architectural latency of the model is determined solely by the temporal resampling ratio between the original time-domain waveform and the embeddings.
 In summary, we make the following key contributions:
 - We propose ***SoundStream***, a neural audio codec in which all the constituent components (encoder, decoder and quantizer) are trained end-to-end with a mix of reconstruction and adversarial losses to achieve superior audio quality.
-- We introduce a new residual vector quantizer, and investigate the rate-distortion-complexity trade-off simplied by its design.
+- We introduce a new residual vector quantizer, and investigate the rate-distortion-complexity trade-off simplified by its design.
 In addition, we propose a novel “quantizer dropout” technique for training the residual vector quantizer, which enables a single model to handle different bitrates.
 - We demonstrate that learning the encoder brings a very significant coding efficiency improvement, with respect to a solution that adopts mel-spectrogram features.
 - We demonstrate by means of subjective quality metrics that ***SoundStream*** outperforms both Opus and EVS over a wide range of bitrates.
@@ -589,8 +589,8 @@ Instead, decreasing the capacity of the decoder has a more significant impact on
 This is aligned with recent findings in the field of neural image compression [67], which also adopt a lighter encoder and a heavier decoder.
 
 **Vector Quantizer Depth and Codebook Size**:
-The number of bits used to encode a single frame is equal to Nqlog2N, where Nq denotes the number of quantizers and N the codebook size.
-Hence, it is possible to achieve the same target bitrate for different combinations of Nqand N.
+The number of bits used to encode a single frame is equal to Nq log2N, where Nq denotes the number of quantizers and N the codebook size.
+Hence, it is possible to achieve the same target bitrate for different combinations of Nq and N.
 Table II shows three configurations, all operating at 6 kbps.
 As expected, using fewer vector quantizers, each with a larger codebook, achieves the highest coding efficiency at the cost of higher computational complexity.
 Remarkably, using a sequence of 80 1-bit quantizers leads only to a modest quality degradation.