BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset

October 11, 2022 ยท Entered Twilight ยท ๐Ÿ› AACL

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitignore, BERTScore, N-gram Repitition Filter, PINCScore, Punctuation Filter, README.md, filter.sh, images, requirements.txt

Authors Ajwad Akil, Najrin Sultana, Abhik Bhattacharjee, Rifat Shahriyar arXiv ID 2210.05109 Category cs.CL: Computation & Language Citations 25 Venue AACL Repository https://github.com/csebuetnlp/banglaparaphrase โญ 15 Last Checked 2 months ago
Abstract
In this work, we present BanglaParaphrase, a high-quality synthetic Bangla Paraphrase dataset curated by a novel filtering pipeline. We aim to take a step towards alleviating the low resource status of the Bangla language in the NLP domain through the introduction of BanglaParaphrase, which ensures quality by preserving both semantics and diversity, making it particularly useful to enhance other Bangla datasets. We show a detailed comparative analysis between our dataset and models trained on it with other existing works to establish the viability of our synthetic paraphrase data generation pipeline. We are making the dataset and models publicly available at https://github.com/csebuetnlp/banglaparaphrase to further the state of Bangla NLP.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago