Letz Translate: Low-Resource Machine Translation for Luxembourgish
March 02, 2023 Β· Declared Dead Β· π ICON
Repo contents: dictionay_DE_LB.json
Authors
Yewei Song, Saad Ezzini, Jacques Klein, Tegawende Bissyande, ClΓ©ment Lefebvre, Anne Goujon
arXiv ID
2303.01347
Category
cs.CL: Computation & Language
Cross-listed
cs.SE
Citations
5
Venue
ICON
Repository
https://github.com/Etamin/Ltz_dictionary
Last Checked
1 month ago
Abstract
Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30\% faster and perform only 4\% lower compared to the large state-of-the-art NLLB model.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computation & Language
π
π
Old Age
π
π
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
π»
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
π»
Ghosted
Deep contextualized word representations
Died the same way β 𦴠Skeleton Repo
R.I.P.
π¦΄
Skeleton Repo
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
R.I.P.
π¦΄
Skeleton Repo
Deep Learning for 3D Point Clouds: A Survey
R.I.P.
π¦΄
Skeleton Repo
Adversarial Examples: Attacks and Defenses for Deep Learning
R.I.P.
π¦΄
Skeleton Repo