CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
June 25, 2024 Β· Declared Dead Β· π Annual Meeting of the Association for Computational Linguistics
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao
arXiv ID
2406.17507
Category
cs.IR: Information Retrieval
Citations
1
Venue
Annual Meeting of the Association for Computational Linguistics
Last Checked
4 months ago
Abstract
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data. Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates, which is challenged by training cost and inference latency with large-scale data. Inspired by the remarkable performance and efficiency of generative models, we propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling, which assigns identifiers to each candidate and treats the generating identifier as the retrieval target. Specifically, we explore an effective coarse-to-fine scheme, combining K-Means and RQ-VAE to discretize multimodal data into token sequences that support autoregressive generation. Further, considering the lack of explicit interaction between queries and candidates, we propose a feature fusion strategy to align their semantics. Extensive experiments demonstrate the effectiveness of the strategies in the CART, achieving excellent results in both retrieval performance and efficiency.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Retrieval
R.I.P.
π»
Ghosted
π
π
Old Age
Neural Graph Collaborative Filtering
R.I.P.
π»
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
R.I.P.
π»
Ghosted
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
R.I.P.
π
404 Not Found
Graph Neural Networks for Social Recommendation
R.I.P.
π»
Ghosted
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted