SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

May 25, 2016 Β· Declared Dead Β· πŸ› Conference and Labs of the Evaluation Forum

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Javid Dadashkarimi, Hossein Nasr Esfahani, Heshaam Faili, Azadeh Shakery arXiv ID 1605.07852 Category cs.IR: Information Retrieval Cross-listed cs.CL Citations 5 Venue Conference and Labs of the Evaluation Forum Last Checked 4 months ago
Abstract
There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective in a number of languages, particularly highly inflected languages. In this paper we propose a method for finding affixes in different positions of a word. Common statistical techniques heavily rely on string similarity in terms of prefix and suffix matching. Since infixes are common in irregular/informal inflections in morphologically complex texts, it is required to find infixes for stemming. In this paper we propose a method whose aim is to find statistical inflectional rules based on minimum edit distance table of word pairs and the likelihoods of the rules in a language. These rules are used to statistically stem words and can be used in different text mining tasks. Experimental results on CLEF 2008 and CLEF 2009 English-Persian CLIR tasks indicate that the proposed method significantly outperforms all the baselines in terms of MAP.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Information Retrieval

Died the same way β€” πŸ‘» Ghosted