GRIN Transfer: A production-ready tool for libraries to retrieve digital copies from Google Books
November 14, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Liza Daly, Matteo Cargnelutti, Catherine Brobston, John Hess, Greg Leppert, Amanda Watson, Jonathan Zittrain
arXiv ID
2511.11447
Category
cs.DL: Digital Libraries
Cross-listed
cs.IR
Citations
0
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Publicly launched in 2004, the Google Books project has scanned tens of millions of items in partnership with libraries around the world. As part of this project, Google created the Google Return Interface (GRIN). Through this platform, libraries can access their scanned collections, the associated metadata, and the ongoing OCR and metadata improvements that become available as Google reprocesses these collections using new technologies. When downloading the Harvard Library Google Books collection from GRIN to develop the Institutional Books dataset, we encountered several challenges related to rate-limiting and atomized metadata within the GRIN platform. To overcome these challenges and help other libraries make more robust use of their Google Books collections, this technical report introduces the initial release of GRIN Transfer. This open-source and production-ready Python pipeline allows partner libraries to efficiently retrieve their Google Books collections from GRIN. This report also introduces an updated version of our Institutional Books 1.0 pipeline, initially used to analyze, augment, and assemble the Institutional Books 1.0 dataset. We have revised this pipeline for compatibility with the output format of GRIN Transfer. A library could pair these two tools to create an end-to-end processing pipeline for their Google Books collection to retrieve, structure, and enhance data available from GRIN. This report gives an overview of how GRIN Transfer was designed to optimize for reliability and usability in different environments, as well as guidance on configuration for various use cases.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Digital Libraries
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Measuring academic influence: Not all citations are equal
R.I.P.
π»
Ghosted
The Open Access Advantage Considering Citation, Article Usage and Social Media Attention
R.I.P.
π»
Ghosted
A Bibliometric Review of Large Language Models Research from 2017 to 2023
R.I.P.
π»
Ghosted
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
R.I.P.
π»
Ghosted
A Systematic Identification and Analysis of Scientists on Twitter
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted