Large Language Models: An Applied Econometric Framework
December 09, 2024 Β· Declared Dead Β· π Social Science Research Network
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jens Ludwig, Sendhil Mullainathan, Ashesh Rambachan
arXiv ID
2412.07031
Category
econ.EM
Cross-listed
cs.AI
Citations
33
Venue
Social Science Research Network
Last Checked
3 months ago
Abstract
Large language models (LLMs) enable researchers to analyze text at unprecedented scale and minimal cost. Researchers can now revisit old questions and tackle novel ones with rich data. We provide an econometric framework for realizing this potential in two empirical uses. For prediction problems -- forecasting outcomes from text -- valid conclusions require ``no training leakage'' between the LLM's training data and the researcher's sample, which can be enforced through careful model choice and research design. For estimation problems -- automating the measurement of economic concepts for downstream analysis -- valid downstream inference requires combining LLM outputs with a small validation sample to deliver consistent and precise estimates. Absent a validation sample, researchers cannot assess possible errors in LLM outputs, and consequently seemingly innocuous choices (which model, which prompt) can produce dramatically different parameter estimates. When used appropriately, LLMs are powerful tools that can expand the frontier of empirical economics.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β econ.EM
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Machine Learning Advances for Time Series Forecasting
R.I.P.
π»
Ghosted
Deep Neural Networks for Estimation and Inference
R.I.P.
π»
Ghosted
Take a Look Around: Using Street View and Satellite Images to Estimate House Prices
R.I.P.
π»
Ghosted
Discrete Choice and Rational Inattention: a General Equivalence Result
R.I.P.
π»
Ghosted
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted