S2Doc -- Spatial-Semantic Document Format

November 02, 2025 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sebastian Kempf, Frank Puppe arXiv ID 2511.01113 Category cs.DL: Digital Libraries Cross-listed cs.CL Citations 0 Venue arXiv.org Last Checked 3 months ago

Abstract

Documents are a common way to store and share information, with tables being an important part of many documents. However, there is no real common understanding of how to model documents and tables in particular. Because of this lack of standardization, most scientific approaches have their own way of modeling documents and tables, leading to a variety of different data structures and formats that are not directly compatible. Furthermore, most data models focus on either the spatial or the semantic structure of a document, neglecting the other aspect. To address this, we developed S2Doc, a flexible data structure for modeling documents and tables that combines both spatial and semantic information in a single format. It is designed to be easily extendable to new tasks and supports most modeling approaches for documents and tables, including multi-page documents. To the best of our knowledge, it is the first approach of its kind to combine all these aspects in a single format.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Digital Libraries

R.I.P. 👻 Ghosted

Constructing bibliometric networks: A comparison between full and fractional counting

Antonio Perianes-Rodriguez, Ludo Waltman, Nees Jan van Eck

cs.DL 🏛 J. Informetrics 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Measuring academic influence: Not all citations are equal

Xiaodan Zhu, Peter Turney, ... (+2 more)

cs.DL 🏛 J. Assoc. Inf. Sci. Technol. 📚 262 cites 11 years ago

R.I.P. 👻 Ghosted

The Open Access Advantage Considering Citation, Article Usage and Social Media Attention

Xianwen Wang, Chen Liu, ... (+2 more)

cs.DL 🏛 Scientometrics 📚 224 cites 11 years ago

R.I.P. 👻 Ghosted

A Bibliometric Review of Large Language Models Research from 2017 to 2023

Lizhou Fan, Lingyao Li, ... (+4 more)

cs.DL 🏛 ACM TIST 📚 208 cites 3 years ago

R.I.P. 👻 Ghosted

On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering

Erica Mourão, João Felipe Pimentel, ... (+4 more)

cs.DL 🏛 IST 📚 157 cites 6 years ago

R.I.P. 👻 Ghosted

A Systematic Identification and Analysis of Scientists on Twitter

Qing Ke, Yong-Yeol Ahn, Cassidy R. Sugimoto

cs.DL 🏛 PLoS ONE 📚 147 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago