🏛️ The Sound Crypt
cs.SD: Where Sound papers rest without their code.
2574
Total Papers
1771
No Code
61
Twilight
742
Has Code
28.8%
Survival Rate
R.I.P.
👻
Ghosted
R.I.P.
👻
Ghosted
Generative Semantic Communication for Text-to-Speech Synthesis
R.I.P.
👻
Ghosted
Content and Style Aware Audio-Driven Facial Animation
R.I.P.
👻
Ghosted
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
R.I.P.
👻
Ghosted
Towards Generalized Source Tracing for Codec-Based Deepfake Speech
R.I.P.
👻
Ghosted
TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
R.I.P.
👻
Ghosted
Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation
R.I.P.
👻
Ghosted
FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders
R.I.P.
👻
Ghosted
StereoSync: Spatially-Aware Stereo Audio Generation from Video
R.I.P.
👻
Ghosted
Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
📚
📚
The Cartographer
A Survey on Evaluation Metrics for Music Generation
R.I.P.
👻
Ghosted
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
📚
📚
The Cartographer
Manipulated Regions Localization For Partially Deepfake Audio: A Survey
R.I.P.
👻
Ghosted
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
R.I.P.
👻
Ghosted
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
R.I.P.
👻
Ghosted
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
R.I.P.
👻
Ghosted
Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond
R.I.P.
👻
Ghosted
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
R.I.P.
👻
Ghosted
Cross-Modal Learning for Music-to-Music-Video Description Generation
R.I.P.
👻
Ghosted
AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
R.I.P.
👻
Ghosted
TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification
R.I.P.
👻
Ghosted
Stylus: Repurposing Stable Diffusion for Training-Free Music Style Transfer on Mel-Spectrograms
R.I.P.
👻
Ghosted
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
R.I.P.
👻
Ghosted