Metadata Enrichment of Long Text Documents using Large Language Models

June 26, 2025 Β· Declared Dead Β· πŸ› Proceedings of the Association for Information Science and Technology

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Manika Lamba, You Peng, Sophie Nikolov, Glen Layne-Worthey, J. Stephen Downie arXiv ID 2506.20918 Category cs.DL: Digital Libraries Cross-listed cs.ET, cs.IR Citations 0 Venue Proceedings of the Association for Information Science and Technology Last Checked 3 months ago
Abstract
In this project, we semantically enriched and enhanced the metadata of long text documents, theses and dissertations, retrieved from the HathiTrust Digital Library in English published from 1920 to 2020 through a combination of manual efforts and large language models. This dataset provides a valuable resource for advancing research in areas such as computational social science, digital humanities, and information science. Our paper shows that enriching metadata using LLMs is particularly beneficial for digital repositories by introducing additional metadata access points that may not have originally been foreseen to accommodate various content types. This approach is particularly effective for repositories that have significant missing data in their existing metadata fields, enhancing search results and improving the accessibility of the digital repository.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Digital Libraries

Died the same way β€” πŸ‘» Ghosted