๐ฎ
๐ฎ
The Ethereal
Atoms as Language: VQ-Atom: Semantic Discretization for Molecular Representation Learning
May 16, 2026 ยท Grace Period ยท ๐ ICML 2026 Workshop
Authors
Takayuki Kimura
arXiv ID
2605.16823
Category
cs.LG: Machine Learning
Citations
0
Venue
ICML 2026 Workshop
Abstract
Molecular representation learning has become a central approach in AI-driven drug discovery, yet existing molecular tokenizations such as SMILES remain largely syntactic and do not naturally align with chemically meaningful substructures. In this work, we introduce VQ-Atom, a semantic discretization framework that converts continuous atom-level graph representations into discrete tokens corresponding to local chemical environments. Using graph neural network embeddings and vector quantization, atoms are assigned to codebook entries representing chemically meaningful atomic contexts. These discrete tokens define a molecular language suitable for Transformer-based pretraining. We evaluate VQ-Atom in protein-ligand interaction prediction under a protein-cold split setting without relying on 3D structural information. Experimental results show that VQ-Atom consistently improves predictive performance compared to conventional tokenization approaches, suggesting that semantically grounded discretization can substantially enhance molecular representation learning. Our findings indicate that token design itself plays a critical role in enabling effective language modeling for chemistry.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
๐ฎ
๐ฎ
The Ethereal
Continuous control with deep reinforcement learning
๐
๐
Old Age
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
๐
๐
Old Age
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
๐
๐
Old Age
SGDR: Stochastic Gradient Descent with Warm Restarts
๐ฎ
๐ฎ
The Ethereal