On Improving Summarization Factual Consistency from Natural Language Feedback

December 20, 2022 ยท Entered Twilight ยท ๐Ÿ› Annual Meeting of the Association for Computational Linguistics

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitignore, CODE_OF_CONDUCT.md, LICENSE, README.md, SECURITY.md, SUPPORT.md, data, data_loader.py

Authors Yixin Liu, Budhaditya Deb, Milagro Teruel, Aaron Halfaker, Dragomir Radev, Ahmed H. Awadallah arXiv ID 2212.09968 Category cs.CL: Computation & Language Citations 46 Venue Annual Meeting of the Association for Computational Linguistics Repository https://github.com/microsoft/DeFacto โญ 30 Last Checked 1 month ago
Abstract
Despite the recent progress in language generation models, their outputs may not always meet user expectations. In this work, we study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment. To this end, we consider factual consistency in summarization, the quality that the summary should only contain information supported by the input documents, as the user-expected preference. We collect a high-quality dataset, DeFacto, containing human demonstrations and informational natural language feedback consisting of corrective instructions, edited summaries, and explanations with respect to the factual consistency of the summary. Using our dataset, we study three natural language generation tasks: (1) editing a summary by following the human feedback, (2) generating human feedback for editing the original summary, and (3) revising the initial summary to correct factual errors by generating both the human feedback and edited summary. We show that DeFacto can provide factually consistent human-edited summaries and further insights into summarization factual consistency thanks to its informational natural language feedback. We further demonstrate that fine-tuned language models can leverage our dataset to improve the summary factual consistency, while large language models lack the zero-shot learning ability in our proposed tasks that require controllable text generation.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago