A Supervised Learning Approach For Heading Detection
August 31, 2018 Β· Declared Dead Β· π Expert Syst. J. Knowl. Eng.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sahib Singh Budhiraja, Vijay Mago
arXiv ID
1809.01477
Category
cs.IR: Information Retrieval
Cross-listed
cs.CL,
cs.LG,
stat.ML
Citations
14
Venue
Expert Syst. J. Knowl. Eng.
Last Checked
4 months ago
Abstract
As the Portable Document Format (PDF) file format increases in popularity, research in analysing its structure for text extraction and analysis is necessary. Detecting headings can be a crucial component of classifying and extracting meaningful data. This research involves training a supervised learning model to detect headings with features carefully selected through recursive feature elimination. The best performing classifier had an accuracy of 96.95%, sensitivity of 0.986 and a specificity of 0.953. This research into heading detection contributes to the field of PDF based text extraction and can be applied to the automation of large scale PDF text analysis in a variety of professional and policy based contexts.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Retrieval
R.I.P.
π»
Ghosted
π
π
Old Age
Neural Graph Collaborative Filtering
R.I.P.
π»
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
R.I.P.
π»
Ghosted
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
R.I.P.
π
404 Not Found
Graph Neural Networks for Social Recommendation
R.I.P.
π»
Ghosted
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted