R.I.P.
π»
Ghosted
MambaVC: Learned Visual Compression with Selective State Spaces
May 24, 2024 Β· Entered Twilight Β· π arXiv.org
Repo contents: README.md, eval.py, models, pic, train.py
Authors
Shiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei Wang
arXiv ID
2405.15413
Category
eess.IV: Image & Video Processing
Cross-listed
cs.CV,
cs.IT
Citations
25
Venue
arXiv.org
Repository
https://github.com/QinSY123/2024-MambaVC
β 62
Last Checked
2 months ago
Abstract
Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity and efficiency. Inspired by this, we take the first step to explore SSMs for visual compression. We introduce MambaVC, a simple, strong and efficient compression network based on SSM. MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling, which helps to capture informative global contexts and enhances compression. On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads. Specifically, it outperforms CNN and Transformer variants by 9.3% and 15.6% on Kodak, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC shows even greater improvements with high-resolution images, highlighting its potential and scalability in real-world applications. We also provide a comprehensive comparison of different network designs, underscoring MambaVC's advantages. Code is available at https://github.com/QinSY123/2024-MambaVC.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Image & Video Processing
R.I.P.
π»
Ghosted
Kvasir-SEG: A Segmented Polyp Dataset
R.I.P.
π»
Ghosted
Deep Learning for Hyperspectral Image Classification: An Overview
R.I.P.
π»
Ghosted
U-Net and its variants for medical image segmentation: theory and applications
R.I.P.
π»
Ghosted
Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing
R.I.P.
π»
Ghosted