Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
September 11, 2019 ยท Entered Twilight ยท ๐ AAAI Conference on Artificial Intelligence
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, data_provider.py, datasets, download, eval.py, method.png, model.py, opt.py, random_anchor.py, requirements.txt, results, scripts, test.py, train.py, util.py
Authors
Jingwen Wang, Lin Ma, Wenhao Jiang
arXiv ID
1909.05010
Category
cs.CV: Computer Vision
Citations
202
Venue
AAAI Conference on Artificial Intelligence
Repository
https://github.com/JaywongWang/CBP
โญ 59
Last Checked
2 months ago
Abstract
The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence). It requires certain models to simultaneously perform visual and linguistic understandings. Previous work predominantly ignores the precision of segment localization. Sliding window based methods use predefined search window sizes, which suffer from redundant computation, while existing anchor-based approaches fail to yield precise localization. We address this issue by proposing an end-to-end boundary-aware model, which uses a lightweight branch to predict semantic boundaries corresponding to the given linguistic information. To better detect semantic boundaries, we propose to aggregate contextual information by explicitly modeling the relationship between the current element and its neighbors. The most confident segments are subsequently selected based on both anchor and boundary predictions at the testing stage. The proposed model, dubbed Contextual Boundary-aware Prediction (CBP), outperforms its competitors with a clear margin on three public datasets. All codes are available on https://github.com/JaywongWang/CBP .
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted