A Supervised Learning Approach For Heading Detection

August 31, 2018 Β· Declared Dead Β· πŸ› Expert Syst. J. Knowl. Eng.

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sahib Singh Budhiraja, Vijay Mago arXiv ID 1809.01477 Category cs.IR: Information Retrieval Cross-listed cs.CL, cs.LG, stat.ML Citations 14 Venue Expert Syst. J. Knowl. Eng. Last Checked 4 months ago
Abstract
As the Portable Document Format (PDF) file format increases in popularity, research in analysing its structure for text extraction and analysis is necessary. Detecting headings can be a crucial component of classifying and extracting meaningful data. This research involves training a supervised learning model to detect headings with features carefully selected through recursive feature elimination. The best performing classifier had an accuracy of 96.95%, sensitivity of 0.986 and a specificity of 0.953. This research into heading detection contributes to the field of PDF based text extraction and can be applied to the automation of large scale PDF text analysis in a variety of professional and policy based contexts.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Information Retrieval

Died the same way β€” πŸ‘» Ghosted