A Multi-way Parallel Named Entity Annotated Corpus for English, Tamil and Sinhala

December 03, 2024 · Declared Dead · 🏛 arXiv.org

Repo contents: LICENSE, README.md, nerannotateddatasets.zip

Authors Surangika Ranathunga, Asanka Ranasinghea, Janaka Shamala, Ayodya Dandeniyaa, Rashmi Galappaththia, Malithi Samaraweeraa arXiv ID 2412.02056 Category cs.CL: Computation & Language Citations 1 Venue arXiv.org Repository https://github.com/suralk/multiNER Last Checked 2 months ago

Abstract

This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages. Using pre-trained multilingual Language Models (mLMs), we establish new benchmark Named Entity Recognition (NER) results on this dataset for Sinhala and Tamil. We also carry out a detailed investigation on the NER capabilities of different types of mLMs. Finally, we demonstrate the utility of our NER system on a low-resource Neural Machine Translation (NMT) task. Our dataset is publicly released: https://github.com/suralk/multiNER.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 💻 Repository 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computation & Language

🌅 🌅 Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL 🏛 NeurIPS 📚 166.0K cites 8 years ago

🌅 🌅 Old Age

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, ... (+2 more)

cs.CL 🏛 NAACL 📚 110.2K cites 7 years ago

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, ... (+8 more)

cs.CL 🏛 arXiv 📚 28.4K cites 6 years ago

R.I.P. 👻 Ghosted

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, ... (+6 more)

cs.CL 🏛 ACL 📚 12.3K cites 6 years ago

R.I.P. 👻 Ghosted

Deep contextualized word representations

Matthew E. Peters, Mark Neumann, ... (+5 more)

cs.CL 🏛 NAACL 📚 12.0K cites 8 years ago

Died the same way — 🦴 Skeleton Repo

R.I.P. 🦴 Skeleton Repo

EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification

Patrick Helber, Benjamin Bischke, ... (+2 more)

cs.CV 🏛 J.STAEORS 📚 2.4K cites 8 years ago

R.I.P. 🦴 Skeleton Repo

Deep Learning for 3D Point Clouds: A Survey

Yulan Guo, Hanyun Wang, ... (+4 more)

cs.CV 🏛 IEEE TPAMI 📚 2.1K cites 6 years ago

R.I.P. 🦴 Skeleton Repo

Adversarial Examples: Attacks and Defenses for Deep Learning

Xiaoyong Yuan, Pan He, ... (+2 more)

cs.LG 🏛 IEEE TNNLS 📚 1.8K cites 8 years ago

R.I.P. 🦴 Skeleton Repo

Neural Style Transfer: A Review

Yongcheng Jing, Yezhou Yang, ... (+4 more)

cs.CV 🏛 IEEE TVCG 📚 828 cites 8 years ago