Can Large Language Models Invent Algorithms to Improve Themselves?: Algorithm Discovery for Recursive Self-Improvement through Reinforcement Learning

October 21, 2024 · Declared Dead · 🏛 NAACL 2025

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yoichi Ishibashi, Taro Yano, Masafumi Oyamada arXiv ID 2410.15639 Category cs.CL: Computation & Language Citations 0 Venue NAACL 2025 Last Checked 4 months ago

Abstract

Large Language Models (LLMs) have achieved remarkable capabilities, yet their improvement methods remain fundamentally constrained by human design. We present Self-Developing, a framework that enables LLMs to autonomously discover, implement, and refine their own improvement algorithms. Our approach employs an iterative cycle where a seed model generates algorithmic candidates as executable code, evaluates their effectiveness, and uses Direct Preference Optimization to recursively improve increasingly sophisticated improvement strategies. We demonstrate this framework through model merging, a practical technique for combining specialized models. Self-Developing successfully discovered novel merging algorithms that outperform existing human-designed algorithms. On mathematical reasoning benchmarks, the autonomously discovered algorithms improve the seed model's GSM8k performance by 6\% and exceed human-designed approaches like Task Arithmetic by 4.3\%. Remarkably, these algorithms exhibit strong generalization, achieving 7.4\% gains on out-of-domain models without re-optimization. Our findings demonstrate that LLMs can transcend their training to invent genuinely novel optimization techniques. This capability represents a crucial step toward a new era where LLMs not only solve problems but autonomously develop the methodologies for their own advancement.