Leveraging Transformer-Based Models for Predicting Inflection Classes of Words in an Endangered Sami Language
November 04, 2024 · Declared Dead · 🏛 IWCLUL
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Khalid Alnajjar, Mika Hämäläinen, Jack Rueter
arXiv ID
2411.02556
Category
cs.CL: Computation & Language
Citations
1
Venue
IWCLUL
Last Checked
1 month ago
Abstract
This paper presents a methodology for training a transformer-based model to classify lexical and morphosyntactic features of Skolt Sami, an endangered Uralic language characterized by complex morphology. The goal of our approach is to create an effective system for understanding and analyzing Skolt Sami, given the limited data availability and linguistic intricacies inherent to the language. Our end-to-end pipeline includes data extraction, augmentation, and training a transformer-based model capable of predicting inflection classes. The motivation behind this work is to support language preservation and revitalization efforts for minority languages like Skolt Sami. Accurate classification not only helps improve the state of Finite-State Transducers (FSTs) by providing greater lexical coverage but also contributes to systematic linguistic documentation for researchers working with newly discovered words from literature and native speakers. Our model achieves an average weighted F1 score of 1.00 for POS classification and 0.81 for inflection class classification. The trained model and code will be released publicly to facilitate future research in endangered NLP.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Computation & Language
🌅
🌅
Old Age
🌅
🌅
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
👻
Ghosted
Language Models are Few-Shot Learners
R.I.P.
👻
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
👻
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
👻
Ghosted
Deep contextualized word representations
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™