LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
May 06, 2024 ยท Entered Twilight ยท ๐ International Conference on Computer Graphics and Interactive Techniques
Repo contents: .gitignore, .gitmodules, .style.yapf, README.md, configs, evaluation, lgtm, playground.ipynb, prepare_data_models.sh, requirements.txt, third_packages
Authors
Haowen Sun, Ruikun Zheng, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu
arXiv ID
2405.03485
Category
cs.CV: Computer Vision
Cross-listed
cs.GR
Citations
23
Venue
International Conference on Computer Graphics and Interactive Techniques
Repository
https://github.com/L-Sun/LGTM
โญ 56
Last Checked
2 months ago
Abstract
In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific motions to the correct body parts. To address this issue, we propose a two-stage pipeline to overcome this challenge: it first employs large language models (LLMs) to decompose global motion descriptions into part-specific narratives, which are then processed by independent body-part motion encoders to ensure precise local semantic alignment. Finally, an attention-based full-body optimizer refines the motion generation results and guarantees the overall coherence. Our experiments demonstrate that LGTM gains significant improvements in generating locally accurate, semantically-aligned human motion, marking a notable advancement in text-to-motion applications. Code and data for this paper are available at https://github.com/L-Sun/LGTM
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted