M3-CVC: Controllable Video Compression with Multimodal Generative Models

November 24, 2024 · Declared Dead · 🏛 IEEE International Conference on Acoustics, Speech, and Signal Processing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Rui Wan, Qi Zheng, Yibo Fan arXiv ID 2411.15798 Category eess.IV: Image & Video Processing Cross-listed cs.CV Citations 6 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 4 months ago

Abstract

Traditional and neural video codecs commonly encounter limitations in controllability and generality under ultra-low-bitrate coding scenarios. To overcome these challenges, we propose M3-CVC, a controllable video compression framework incorporating multimodal generative models. The framework utilizes a semantic-motion composite strategy for keyframe selection to retain critical information. For each keyframe and its corresponding video clip, a dialogue-based large multimodal model (LMM) approach extracts hierarchical spatiotemporal details, enabling both inter-frame and intra-frame representations for improved video fidelity while enhancing encoding interpretability. M3-CVC further employs a conditional diffusion-based, text-guided keyframe compression method, achieving high fidelity in frame reconstruction. During decoding, textual descriptions derived from LMMs guide the diffusion process to restore the original video's content accurately. Experimental results demonstrate that M3-CVC significantly outperforms the state-of-the-art VVC standard in ultra-low bitrate scenarios, particularly in preserving semantic and perceptual fidelity.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Image & Video Processing

R.I.P. 👻 Ghosted

Variational image compression with a scale hyperprior

Johannes Ballé, David Minnen, ... (+3 more)

eess.IV 🏛 ICLR 📚 2.2K cites 8 years ago

📚 📚 The Cartographer

Deep Learning for Hyperspectral Image Classification: An Overview

Shutao Li, Weiwei Song, ... (+4 more)

eess.IV 🏛 IEEE TGRS 📚 1.5K cites 6 years ago

R.I.P. 👻 Ghosted

U-Net and its variants for medical image segmentation: theory and applications

Nahian Siddique, Paheding Sidike, ... (+2 more)

eess.IV 🏛 IEEE Access 📚 1.4K cites 5 years ago

R.I.P. 👻 Ghosted

Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing

Vishal Monga, Yuelong Li, Yonina C. Eldar

eess.IV 🏛 IEEE Signal Processing Magazine 📚 1.3K cites 6 years ago

R.I.P. 💀 404 Not Found

Lightweight Image Super-Resolution with Information Multi-distillation Network

Zheng Hui, Xinbo Gao, ... (+2 more)

eess.IV 🏛 ACM MM 📚 1.1K cites 6 years ago

R.I.P. 👻 Ghosted

Deep Learning on Image Denoising: An overview

Chunwei Tian, Lunke Fei, ... (+4 more)

eess.IV 🏛 Neural Networks 📚 941 cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago