Large Language Model Sourcing: A Survey

October 11, 2025 · The Cartographer · 🏛 arXiv.org

"No code URL or promise found in abstract"
"Title-pattern auto-detect: Large Language Model Sourcing: A Survey"

Evidence collected by the PWNC Scanner

Authors Liang Pang, Jia Gu, Sunhao Dai, Zihao Wei, Zenghao Duan, Kangxi Wu, Zhiyi Yin, Jun Xu, Huawei Shen, Xueqi Cheng arXiv ID 2510.10161 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 1 Venue arXiv.org Last Checked 4 days ago

Abstract

Due to the black-box nature of large language models (LLMs) and the realism of their generated content, issues such as hallucinations, bias, unfairness, and copyright infringement have become significant. In this context, sourcing information from multiple perspectives is essential. This survey presents a systematic investigation organized around four interrelated dimensions: Model Sourcing, Model Structure Sourcing, Training Data Sourcing, and External Data Sourcing. Moreover, a unified dual-paradigm taxonomy is proposed that classifies existing sourcing methods into prior-based (proactive traceability embedding) and posterior-based (retrospective inference) approaches. Traceability across these dimensions enhances the transparency, accountability, and trustworthiness of LLMs deployment in real-world applications.