๐
๐
Old Age
BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset
October 11, 2022 ยท Entered Twilight ยท ๐ AACL
Repo contents: .gitignore, BERTScore, N-gram Repitition Filter, PINCScore, Punctuation Filter, README.md, filter.sh, images, requirements.txt
Authors
Ajwad Akil, Najrin Sultana, Abhik Bhattacharjee, Rifat Shahriyar
arXiv ID
2210.05109
Category
cs.CL: Computation & Language
Citations
25
Venue
AACL
Repository
https://github.com/csebuetnlp/banglaparaphrase
โญ 15
Last Checked
2 months ago
Abstract
In this work, we present BanglaParaphrase, a high-quality synthetic Bangla Paraphrase dataset curated by a novel filtering pipeline. We aim to take a step towards alleviating the low resource status of the Bangla language in the NLP domain through the introduction of BanglaParaphrase, which ensures quality by preserving both semantics and diversity, making it particularly useful to enhance other Bangla datasets. We show a detailed comparative analysis between our dataset and models trained on it with other existing works to establish the viability of our synthetic paraphrase data generation pipeline. We are making the dataset and models publicly available at https://github.com/csebuetnlp/banglaparaphrase to further the state of Bangla NLP.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted