Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

January 06, 2020 Β· Declared Dead Β· πŸ› Lecture Notes in Electrical Engineering

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Amir Jalilifard, Vinicius F. CaridΓ‘, Alex F. Mansano, Rogers S. Cristo, Felipe Penhorate C. da Fonseca arXiv ID 2001.09896 Category cs.IR: Information Retrieval Cross-listed cs.CL, cs.LG, stat.ML Citations 79 Venue Lecture Notes in Electrical Engineering Last Checked 3 months ago
Abstract
Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this paper we propose STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in a corpus. A set of nearly four million documents from health-care social media was collected and was trained in order to draw semantic model and to find the word embeddings. Then, the features of semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to improve the moderate performance of this algorithm on informal texts. After testing the proposed method with 200 randomly chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% and reaching the mean error of 13.7%, as opposed to 27.2% of the original TF-IDF.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Information Retrieval

Died the same way β€” πŸ‘» Ghosted