Concurrent Data Structures Made Easy (Extended Version)
August 25, 2024 Β· Declared Dead Β· π Proc. ACM Program. Lang.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Callista Le, Kiran Gopinathan, Koon Wen Lee, Seth Gilbert, Ilya Sergey
arXiv ID
2408.13779
Category
cs.PL: Programming Languages
Cross-listed
cs.DC
Citations
1
Venue
Proc. ACM Program. Lang.
Last Checked
4 months ago
Abstract
Design of an efficient thread-safe concurrent data structure is a balancing act between its implementation complexity and performance. Lock-based concurrent data structures, which are relatively easy to derive from their sequential counterparts and to prove thread-safe, suffer from poor throughput under even light multi-threaded workload. At the same time, lock-free concurrent structures allow for high throughput, but are notoriously difficult to get right and require careful reasoning to formally establish their correctness. We explore a solution to this conundrum based on batch parallelism, an approach for designing concurrent data structures via a simple insight: efficiently processing a batch of a priori known operations in parallel is easier than optimising performance for a stream of arbitrary asynchronous requests. Alas, batch-parallel structures have not seen wide practical adoption due to (i) the inconvenience of having to structure multi-threaded programs to explicitly group operations and (ii) the lack of a systematic methodology to implement batch-parallel structures as simply as lock-based ones. We present OBatcher-an OCaml library that streamlines the design, implementation, and usage of batch-parallel structures. It solves the first challenge (how to use) by suggesting a new lightweight implicit batching design that is built on top of generic asynchronous programming mechanisms. The second challenge (how to implement) is addressed by identifying a family of strategies for converting common sequential structures into efficient batch-parallel ones. We showcase OBatcher with a diverse set of benchmarks. Our evaluation of all the implementations on large asynchronous workloads shows that (a) they consistently outperform the corresponding coarse-grained lock-based implementations and that (b) their throughput scales reasonably with the number of processors.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Programming Languages
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
R.I.P.
π»
Ghosted
Glow: Graph Lowering Compiler Techniques for Neural Networks
R.I.P.
π»
Ghosted
Learnable Programming: Blocks and Beyond
R.I.P.
π»
Ghosted
Scenic: A Language for Scenario Specification and Scene Generation
R.I.P.
π»
Ghosted
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted