Direct Speech-to-Speech Neural Machine Translation: A Survey
November 13, 2024 ยท The Cartographer ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: Direct Speech-to-Speech Neural Machine Translation: A Survey"
Evidence collected by the PWNC Scanner
Authors
Mahendra Gupta, Maitreyee Dutta, Chandresh Kumar Maurya
arXiv ID
2411.14453
Category
cs.CL: Computation & Language
Cross-listed
cs.SD,
eess.AS
Citations
5
Venue
arXiv.org
Last Checked
3 days ago
Abstract
Speech-to-Speech Translation (S2ST) models transform speech from one language to another target language with the same linguistic information. S2ST is important for bridging the communication gap among communities and has diverse applications. In recent years, researchers have introduced direct S2ST models, which have the potential to translate speech without relying on intermediate text generation, have better decoding latency, and the ability to preserve paralinguistic and non-linguistic features. However, direct S2ST has yet to achieve quality performance for seamless communication and still lags behind the cascade models in terms of performance, especially in real-world translation. To the best of our knowledge, no comprehensive survey is available on the direct S2ST system, which beginners and advanced researchers can look upon for a quick survey. The present work provides a comprehensive review of direct S2ST models, data and application issues, and performance metrics. We critically analyze the models' performance over the benchmark datasets and provide research challenges and future directions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age