LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?

September 25, 2025 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Lu Sun, Shihan Fu, Bingsheng Yao, Yuxuan Lu, Wenbo Li, Hansu Gu, Jiri Gesi, Jing Huang, Chen Luo, Dakuo Wang arXiv ID 2509.21501 Category cs.HC: Human-Computer Interaction Cross-listed cs.CL Citations 7 Venue arXiv.org Last Checked 4 months ago

Abstract

Agentic AI is emerging, capable of executing tasks through natural language, such as Copilot for coding or Amazon Rufus for shopping. Evaluating these systems is challenging, as their rapid evolution outpaces traditional human evaluation. Researchers have proposed LLM Agents to simulate participants as digital twins, but it remains unclear to what extent a digital twin can represent a specific customer in multi-turn interaction with an agentic AI system. In this paper, we recruited 40 human participants to shop with Amazon Rufus, collected their personas, interaction traces, and UX feedback, and then created digital twins to repeat the task. Pairwise comparison of human and digital-twin traces shows that while agents often explored more diverse choices, their action patterns aligned with humans and yielded similar design feedback. This study is the first to quantify how closely LLM agents can mirror human multi-turn interaction with an agentic AI system, highlighting their potential for scalable evaluation.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Human-Computer Interaction

R.I.P. 👻 Ghosted

Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy

Ben Shneiderman

cs.HC 🏛 International journal of human computer interactions 📚 974 cites 6 years ago

R.I.P. 👻 Ghosted

Improving fairness in machine learning systems: What do industry practitioners need?

Kenneth Holstein, Jennifer Wortman Vaughan, ... (+3 more)

cs.HC 🏛 CHI 📚 919 cites 7 years ago

R.I.P. 👻 Ghosted

Identifying Stable Patterns over Time for Emotion Recognition from EEG

Wei-Long Zheng, Jia-Yi Zhu, Bao-Liang Lu

cs.HC 🏛 IEEE TAC 📚 837 cites 10 years ago

R.I.P. 👻 Ghosted

Questioning the AI: Informing Design Practices for Explainable AI User Experiences

Q. Vera Liao, Daniel Gruen, Sarah Miller

cs.HC 🏛 CHI 📚 835 cites 6 years ago

R.I.P. 👻 Ghosted

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities

Kaixuan Chen, Dalin Zhang, ... (+4 more)

cs.HC 🏛 ACM CSUR 📚 788 cites 6 years ago

R.I.P. 👻 Ghosted

Educational data mining and learning analytics: An updated survey

C. Romero, S. Ventura

cs.HC 🏛 WIREs Data Mining Knowl. Discov. 📚 787 cites 2 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago