VisEval: A Benchmark for Data Visualization in the Era of Large Language Models

July 01, 2024 · Declared Dead · 🏛 IEEE Transactions on Visualization and Computer Graphics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Nan Chen, Yuge Zhang, Jiahang Xu, Kan Ren, Yuqing Yang arXiv ID 2407.00981 Category cs.HC: Human-Computer Interaction Cross-listed cs.CL Citations 39 Venue IEEE Transactions on Visualization and Computer Graphics Last Checked 3 months ago

Abstract

Translating natural language to visualization (NL2VIS) has shown great promise for visual data analysis, but it remains a challenging task that requires multiple low-level implementations, such as natural language processing and visualization design. Recent advancements in pre-trained large language models (LLMs) are opening new avenues for generating visualizations from natural language. However, the lack of a comprehensive and reliable benchmark hinders our understanding of LLMs' capabilities in visualization generation. In this paper, we address this gap by proposing a new NL2VIS benchmark called VisEval. Firstly, we introduce a high-quality and large-scale dataset. This dataset includes 2,524 representative queries covering 146 databases, paired with accurately labeled ground truths. Secondly, we advocate for a comprehensive automated evaluation methodology covering multiple dimensions, including validity, legality, and readability. By systematically scanning for potential issues with a number of heterogeneous checkers, VisEval provides reliable and trustworthy evaluation outcomes. We run VisEval on a series of state-of-the-art LLMs. Our evaluation reveals prevalent challenges and delivers essential insights for future advancements.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Human-Computer Interaction

R.I.P. 👻 Ghosted

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy

Ben Shneiderman

cs.HC 🏛 International journal of human computer interactions 📚 974 cites 6 years ago

R.I.P. 👻 Ghosted

Improving fairness in machine learning systems: What do industry practitioners need?

Kenneth Holstein, Jennifer Wortman Vaughan, ... (+3 more)

cs.HC 🏛 CHI 📚 919 cites 7 years ago

R.I.P. 👻 Ghosted

Identifying Stable Patterns over Time for Emotion Recognition from EEG

Wei-Long Zheng, Jia-Yi Zhu, Bao-Liang Lu

cs.HC 🏛 IEEE TAC 📚 837 cites 10 years ago

R.I.P. 👻 Ghosted

Questioning the AI: Informing Design Practices for Explainable AI User Experiences

Q. Vera Liao, Daniel Gruen, Sarah Miller

cs.HC 🏛 CHI 📚 835 cites 6 years ago

R.I.P. 👻 Ghosted

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

Kaixuan Chen, Dalin Zhang, ... (+4 more)

cs.HC 🏛 ACM CSUR 📚 788 cites 6 years ago

R.I.P. 👻 Ghosted

Educational data mining and learning analytics: An updated survey

C. Romero, S. Ventura

cs.HC 🏛 WIREs Data Mining Knowl. Discov. 📚 787 cites 2 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago