simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

August 27, 2018 · Entered Twilight · 🏛 Conference on Empirical Methods in Natural Language Processing

"Last commit was 7.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: KarpathySplit.py, README.md, build_vocab.py, data, data_loader.py, model.py, test.py, train.py

Authors Fenglin Liu, Xuancheng Ren, Yuanxin Liu, Houfeng Wang, Xu Sun arXiv ID 1808.08732 Category cs.CL: Computation & Language Citations 71 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/lancopku/simNet ⭐ 36 Last Checked 2 months ago

Abstract

The encode-decoder framework has shown recent success in image captioning. Visual attention, which is good at detailedness, and semantic attention, which is good at comprehensiveness, have been separately proposed to ground the caption on the image. In this paper, we propose the Stepwise Image-Topic Merging Network (simNet) that makes use of the two kinds of attention at the same time. At each time step when generating the caption, the decoder adaptively merges the attentive information in the extracted topics and the image according to the generated context, so that the visual information and the semantic information can be effectively combined. The proposed approach is evaluated on two benchmark datasets and reaches the state-of-the-art performances.(The code is available at https://github.com/lancopku/simNet)