A faster algorithm for efficient longest common substring calculation for non-parametric entropy estimation in sequential data
October 15, 2025 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .github, LCSFinder.cpp, LCSFinder.egg-info, LCSFinder.h, LCSFinder.i, LCSFinder.py, LCSFinder_wrap.cxx, LICENSE.md, README.md, _LCSFinder.cpython-311-darwin.so, _LCSFinder.cpython-39-darwin.so, __pycache__, build, dist, process_ent_functs.py, ru.sh, setup.cfg, setup.py, tests
Authors
Bridget Smart, Max Ward, Matthew Roughan
arXiv ID
2510.13330
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.IT
Citations
0
Venue
arXiv.org
Repository
https://github.com/bridget-smart/LCSFinder
โญ 3
Last Checked
3 months ago
Abstract
Non-parametric entropy estimation on sequential data is a fundamental tool in signal processing, capturing information flow within or between processes to measure predictability, redundancy, or similarity. Methods based on longest common substrings (LCS) provide a non-parametric estimate of typical set size but are often inefficient, limiting use on real-world data. We introduce LCSFinder, a new algorithm that improves the worst-case performance of LCS calculations from cubic to log-linear time. Although built on standard algorithmic constructs - including sorted suffix arrays and persistent binary search trees - the details require care to provide the matches required for entropy estimation on dynamically growing sequences. We demonstrate that LCSFinder achieves dramatic speedups over existing implementations on real and simulated data, enabling entropy estimation at scales previously infeasible in practical signal processing.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Data Structures & Algorithms
๐
๐
The Cartographer
R.I.P.
๐ป
Ghosted
Route Planning in Transportation Networks
R.I.P.
๐ป
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
๐ป
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
๐ป
Ghosted
Graph Isomorphism in Quasipolynomial Time
๐
๐
The Cartographer