Dynamic Indexing Through Learned Indices with Worst-case Guarantees
March 06, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Emil Toftegaard Gæde, Ivor van der Hoog, Eva Rotenberg, Tord Stordalen
arXiv ID
2503.05007
Category
cs.CG: Computational Geometry
Cross-listed
cs.DS
Citations
0
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Indexing data is a fundamental problem in computer science. Recently, various papers apply machine learning to this problem. For a fixed integer $\varepsilon$, a \emph{learned index} is a function $h : \mathcal{U} \rightarrow [0, n]$ where $\forall q \in \mathcal{U}$, $h(q) \in [\text{rank}(q) - \varepsilon, \text{rank}(q) + \varepsilon]$. These works use machine learning to compute $h$. Then, they store $S$ in a sorted array $A$ and access $A[\lfloor h(q) \rfloor]$ to answer queries in $O(k + \varepsilon + \log |h|)$ time. Here, $k$ denotes the output size and $|h|$ the complexity of $h$. Ferragina and Vinciguerra (VLDB 2020) observe that creating a learned index is a geometric problem. They define the PGM index by restricting $h$ to a piecewise linear function and show a linear-time algorithm to compute a PGM index of approximate minimum complexity. Since indexing queries are decomposable, the PGM index may be made dynamic through the logarithmic method. When allowing deletions, range query times deteriorate to worst-case $O(N + \sum\limits_i^{\lceil \log n \rceil } (\varepsilon + \log |h_i|))$ time (where $N$ is the largest size of $S$ seen so far). This paper offers a combination of theoretical insights and experiments as we apply techniques from computational geometry to dynamically maintain an approximately minimum-complexity learned index $h : \mathcal{U} \rightarrow [0, n]$ with $O(\log^2 n)$ update time. We also prove that if we restrict $h$ to lie in a specific subclass of piecewise-linear functions, then we can combine $h$ and hash maps to support queries in $O(k + \varepsilon + \log |h|)$ time (at the cost of increasing $|h|$). We implement our algorithm and compare it to the existing implementation. Our empirical analysis shows that our solution supports more efficient range queries in the special case where the update sequence contains many deletions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computational Geometry
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Dynamic Planar Convex Hull
R.I.P.
π»
Ghosted
TEMPO: Feature-Endowed TeichmΓΌller Extremal Mappings of Point Clouds
R.I.P.
π»
Ghosted
Explainable Artificial Intelligence for Manufacturing Cost Estimation and Machining Feature Visualization
R.I.P.
π»
Ghosted
Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal
R.I.P.
π»
Ghosted
Momen(e)t: Flavor the Moments in Learning to Classify Shapes
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted