|
23 | 23 |
|
24 | 24 | ## Abstract: 摘要
|
25 | 25 |
|
26 |
| -In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning has brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. |
| 26 | +<table> |
| 27 | +<td> |
| 28 | + |
| 29 | +In the last decade of **automatic speech recognition (ASR)** research, the introduction of deep learning has brought considerable reductions in **word error rate** of more than 50% relative, compared to modeling without deep learning. |
27 | 30 | In the wake of this transition, a number of all-neural ASR architectures have been introduced.
|
28 |
| -These so-called end-to-end (E2E) models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, with lower dependence on ASR domain-specific experience. |
| 31 | +These so-called **end-to-end (E2E)** models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, with lower dependence on ASR domain-specific experience. |
29 | 32 | The success and enthusiastic adoption of deep learning, accompanied by more generic model architectures has led to E2E models now becoming the prominent ASR approach.
|
30 |
| -The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements, and to discuss their properties and their relationship to classical hidden Markov model (HMM) based ASR architectures. |
| 33 | +The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements, and to discuss their properties and their relationship to classical **hidden Markov model (HMM)** based ASR architectures. |
31 | 34 | All relevant aspects of E2E ASR are covered in this work: modeling, training, decoding, and external language model integration, discussions of performance and deployment opportunities, as well as an outlook into potential future developments.
|
32 | 35 |
|
| 36 | +</td> |
| 37 | +<td> |
| 38 | + |
| 39 | +在过去十年的**自动语音识别 (Automatic Speech Recognition, ASR)** 研究中, 深度学习的引入使得**词错误率 (Word Error Rate, WER)** 相比非深度学习模型相对降低了超过 50%. |
| 40 | +随着这一转变, 许多全神经 ASR 架构被提出. |
| 41 | +这些所谓的**端到端 (End-to-End, E2E)** 模型提供了高度集成, 完全神经网络化的 ASR 模型, 它高度依赖于通用的机器学习知识, 从数据中更一致地学习, 并且对 ASR 领域特定经验的依赖更低. |
| 42 | +深度学习的成功和狂热应用, 以及更通用模型架构的出现, 已经使得 E2E 模型现已成为主流 ASR 方法. |
| 43 | +本综述的目标是提供 E2E ASR 模型的分类体系和对应改进, 并讨论它们的性质和经典的基于**隐马尔可夫模型 (Hidden Markov Model, HMM)** 的 ASR 模型架构之间的关系. |
| 44 | +本工作涵盖了 E2E ASR 的所有相关方面: 建模, 训练, 解码, 外部语言模型集成, 对性能和部署机遇的讨论, 以及潜在的未来发展方向的展望. |
| 45 | + |
| 46 | +</td> |
| 47 | +</table> |
| 48 | + |
33 | 49 | ## 1·Introduction: 引言
|
34 | 50 |
|
35 | 51 | ## 2·Related Works: 相关工作
|
|
0 commit comments