Hue: A User-Adaptive Parser for Hybrid Logs

August 14, 2023 · Declared Dead · 🏛 ESEC/SIGSOFT FSE

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Junjielong Xu, Qiuai Fu, Zhouruixing Zhu, Yutong Cheng, Zhijing Li, Yuchi Ma, Pinjia He arXiv ID 2308.07085 Category cs.SE: Software Engineering Citations 20 Venue ESEC/SIGSOFT FSE Last Checked 2 months ago

Abstract

Log parsing, which extracts log templates from semi-structured logs and produces structured logs, is the first and the most critical step in automated log analysis. While existing log parsers have achieved decent results, they suffer from two major limitations by design. First, they do not natively support hybrid logs that consist of both single-line logs and multi-line logs (\eg Java Exception and Hadoop Counters). Second, they fall short in integrating domain knowledge in parsing, making it hard to identify ambiguous tokens in logs. This paper defines a new research problem, \textit{hybrid log parsing}, as a superset of traditional log parsing tasks, and proposes \textit{Hue}, the first attempt for hybrid log parsing via a user-adaptive manner. Specifically, Hue converts each log message to a sequence of special wildcards using a key casting table and determines the log types via line aggregating and pattern extracting. In addition, Hue can effectively utilize user feedback via a novel merge-reject strategy, making it possible to quickly adapt to complex and changing log templates. We evaluated Hue on three hybrid log datasets and sixteen widely-used single-line log datasets (\ie Loghub). The results show that Hue achieves an average grouping accuracy of 0.845 on hybrid logs, which largely outperforms the best results (0.563 on average) obtained by existing parsers. Hue also exhibits SOTA performance on single-line log datasets. Furthermore, Hue has been successfully deployed in a real production environment for daily hybrid log parsing.