UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
October 17, 2024 Β· Declared Dead Β· π North American Chapter of the Association for Computational Linguistics
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yuzhe Yang, Yifei Zhang, Yan Hu, Yilin Guo, Ruoli Gan, Yueru He, Mingcong Lei, Xiao Zhang, Haining Wang, Qianqian Xie, Jimin Huang, Honghai Yu, Benyou Wang
arXiv ID
2410.14059
Category
q-fin.CP
Cross-listed
cs.CE,
cs.CL
Citations
2
Venue
North American Chapter of the Association for Computational Linguistics
Last Checked
3 months ago
Abstract
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 11 LLMs services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial domain but also provides a robust framework for assessing their performance and user satisfaction.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β q-fin.CP
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Deep Reinforcement Learning for Trading
R.I.P.
π»
Ghosted
Solving the Optimal Trading Trajectory Problem Using a Quantum Annealer
R.I.P.
π»
Ghosted
Neural networks for option pricing and hedging: a literature review
R.I.P.
π»
Ghosted
Lagged correlation-based deep learning for directional trend change prediction in financial time series
R.I.P.
π»
Ghosted
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted