Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach

March 20, 2025 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tarek Mahmud, Bin Duan, Corina Pasareanu, Guowei Yang arXiv ID 2503.15838 Category cs.SE: Software Engineering Citations 10 Venue arXiv.org Last Checked 4 months ago

Abstract

Ensemble learning has been widely used in machine learning to improve model robustness, accuracy, and generalization, but has not yet been applied to code generation tasks with large language models (LLMs). We propose an ensemble approach for LLMs in code generation. Instead of relying on the output of a single model, we generate multiple candidate programs from different LLMs and apply a structured voting mechanism to select the most reliable solution. For voting, we compute syntactic and semantic similarity using CodeBLEU and behavioral equivalence using CrossHair's differential behavior analysis. By aggregating these similarity scores, we select the program that best aligns with the consensus among the candidates. We show through experiments that our ensemble approach consistently outperforms standalone LLMs on the well-known HumanEval and the more challenging LiveCodeBench datasets, achieving an accuracy of 90.2% and 50.2%, respectively, on the two datasets. In comparison, the best-performing LLM (GPT-4o) has an accuracy of 83.5% and 43.4%, respectively. Furthermore, even when restricted to free open-source models, our method achieves an accuracy of 80.5% and 41.6%, respectively, demonstrating the viability of our approach in resource-constrained settings.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Software Engineering

R.I.P. 👻 Ghosted

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Yuchi Tian, Kexin Pei, ... (+2 more)

cs.SE 🏛 ICSE 📚 1.4K cites 8 years ago

R.I.P. 👻 Ghosted

Microservices: yesterday, today, and tomorrow

Nicola Dragoni, Saverio Giallorenzo, ... (+5 more)

cs.SE 🏛 Present and Ulterior Software Engineering 📚 1.1K cites 10 years ago

📚 📚 The Cartographer

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, ... (+2 more)

cs.SE 🏛 ACM CSUR 📚 962 cites 8 years ago

R.I.P. 👻 Ghosted

An Overview on Smart Contracts: Challenges, Advances and Platforms

Zibin Zheng, Shaoan Xie, ... (+5 more)

cs.SE 🏛 FGCS 📚 917 cites 6 years ago

R.I.P. 👻 Ghosted

Slither: A Static Analysis Framework For Smart Contracts

Josselin Feist, Gustavo Grieco, Alex Groce

cs.SE 🏛 ICETSEB W 📚 823 cites 6 years ago

R.I.P. 👻 Ghosted

ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection

Bo Jiang, Ye Liu, W. K. Chan

cs.SE 🏛 ASE 📚 790 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago