DC-Prophet: Predicting Catastrophic Machine Failures in DataCenters

August 14, 2017 Β· Declared Dead Β· πŸ› ECML/PKDD

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors You-Luen Lee, Da-Cheng Juan, Xuan-An Tseng, Yu-Ting Chen, Shih-Chieh Chang arXiv ID 1709.06537 Category cs.DC: Distributed Computing Cross-listed stat.ML Citations 16 Venue ECML/PKDD Last Checked 4 months ago
Abstract
When will a server fail catastrophically in an industrial datacenter? Is it possible to forecast these failures so preventive actions can be taken to increase the reliability of a datacenter? To answer these questions, we have studied what are probably the largest, publicly available datacenter traces, containing more than 104 million events from 12,500 machines. Among these samples, we observe and categorize three types of machine failures, all of which are catastrophic and may lead to information loss, or even worse, reliability degradation of a datacenter. We further propose a two-stage framework-DC-Prophet-based on One-Class Support Vector Machine and Random Forest. DC-Prophet extracts surprising patterns and accurately predicts the next failure of a machine. Experimental results show that DC-Prophet achieves an AUC of 0.93 in predicting the next machine failure, and a F3-score of 0.88 (out of 1). On average, DC-Prophet outperforms other classical machine learning methods by 39.45% in F3-score.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Distributed Computing

Died the same way β€” πŸ‘» Ghosted