๐
๐
Old Age
LARCH: Large Language Model-based Automatic Readme Creation with Heuristics
August 06, 2023 ยท Entered Twilight ยท ๐ International Conference on Information and Knowledge Management
Repo contents: .gitignore, Dockerfile, LICENSE, MANIFEST.in, README.md, larch, requirements-dev.txt, requirements.txt, scripts, setup.py, tests
Authors
Yuta Koreeda, Terufumi Morishita, Osamu Imaichi, Yasuhiro Sogawa
arXiv ID
2308.03099
Category
cs.CL: Computation & Language
Cross-listed
cs.SE
Citations
8
Venue
International Conference on Information and Knowledge Management
Repository
https://github.com/hitachi-nlp/larch
โญ 17
Last Checked
2 months ago
Abstract
Writing a readme is a crucial aspect of software development as it plays a vital role in managing and reusing program code. Though it is a pain point for many developers, automatically creating one remains a challenge even with the recent advancements in large language models (LLMs), because it requires generating an abstract description from thousands of lines of code. In this demo paper, we show that LLMs are capable of generating a coherent and factually correct readmes if we can identify a code fragment that is representative of the repository. Building upon this finding, we developed LARCH (LLM-based Automatic Readme Creation with Heuristics) which leverages representative code identification with heuristics and weak supervision. Through human and automated evaluations, we illustrate that LARCH can generate coherent and factually correct readmes in the majority of cases, outperforming a baseline that does not rely on representative code identification. We have made LARCH open-source and provided a cross-platform Visual Studio Code interface and command-line interface, accessible at https://github.com/hitachi-nlp/larch. A demo video showcasing LARCH's capabilities is available at https://youtu.be/ZUKkh5ED-O4.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted