Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models

October 10, 2024 · Declared Dead · 🏛 Conference on Empirical Methods in Natural Language Processing

Authors Zhipeng Chen, Kun Zhou, Liang Song, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen arXiv ID 2410.07825 Category cs.CL: Computation & Language Citations 0 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/RUCAIBox/MAET ⭐ 1 Last Checked 2 months ago

Abstract

Multi-lingual ability transfer has become increasingly important for the broad application of large language models (LLMs). Existing work highly relies on training with the multi-lingual ability-related data, which may not be available for low-resource languages. To solve it, we propose a Multi-lingual Abilities Extraction and Combination approach, named as MAEC. Our key idea is to decompose and extract language-agnostic ability-related weights from LLMs, and combine them across different languages by simple addition and subtraction operations without training. Specifically, our MAEC consists of the extraction and combination stages. In the extraction stage, we firstly locate key neurons that are highly related to specific abilities, and then employ them to extract the transferable ability-related weights. In the combination stage, we further select the ability-related tensors that mitigate the linguistic effects, and design a combining strategy based on them and the language-specific weights, to build the multi-lingual ability-enhanced LLM. To assess the effectiveness of our approach, we conduct extensive experiments on LLaMA-3 8B on mathematical and scientific tasks in both high-resource and low-resource lingual scenarios. Experiment results have shown that MAEC can effectively and efficiently extract and combine the advanced abilities, achieving comparable performance with PaLM. Resources are available at https://github.com/RUCAIBox/MAET.