Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

June 06, 2023 Β· Declared Dead Β· πŸ› International Conference on Statistical and Scientific Database Management

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Odej Kao arXiv ID 2306.03672 Category cs.DC: Distributed Computing Cross-listed cs.DB Citations 3 Venue International Conference on Statistical and Scientific Database Management Last Checked 4 months ago
Abstract
Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration. In this paper, we analyze the challenge of efficient resource allocation for distributed data processing, focusing on memory. We emphasize that in-memory processing with in-memory data processing frameworks can undermine resource efficiency. Based on the findings of our trace data analysis, we compile requirements towards an automated solution for efficient cluster resource allocation.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Distributed Computing

Died the same way β€” πŸ‘» Ghosted