Representing Molecules with Algebraic Data Types: Beyond SMILES and SELFIES
January 23, 2025 Β· Declared Dead Β· + Add venue
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Oliver Goldstein, Samuel March
arXiv ID
2501.13633
Category
cs.PL: Programming Languages
Cross-listed
cs.LG
Citations
0
Last Checked
4 months ago
Abstract
Benchmarks of molecular machine learning models often treat the molecular representation as a neutral input format, yet the representation defines the syntax of validity, edit operations, and invariances that models implicitly learn. We propose MolADT, a typed intermediate representation (IR) for molecules expressed as a family of algebraic data types that separates (i) constitution via Dietz-style bonding systems, (ii) 3D geometry and stereochemistry, and (iii) optional electronic annotations. By shifting from string edits to operations over structured values, MolADT makes representational assumptions explicit, supports deterministic validation and localized transformations, and provides hooks for symmetry-aware and Bayesian workflows. We provide a reference implementation in Haskell (open-source, archived with DOI) and worked examples demonstrating delocalised/multicentre bonding, validation invariants, reaction extensions, and group actions relevant to geometric learning.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Programming Languages
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
R.I.P.
π»
Ghosted
Glow: Graph Lowering Compiler Techniques for Neural Networks
R.I.P.
π»
Ghosted
Learnable Programming: Blocks and Beyond
R.I.P.
π»
Ghosted
Scenic: A Language for Scenario Specification and Scene Generation
R.I.P.
π»
Ghosted
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted