Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics
December 13, 2024 Β· Declared Dead Β· π Machine Learning: Science and Technology
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Oz Amram, Luca Anzalone, Joschka Birk, Darius A. Faroughy, Anna Hallin, Gregor Kasieczka, Michael KrΓ€mer, Ian Pang, Humberto Reyes-Gonzalez, David Shih
arXiv ID
2412.10504
Category
hep-ph
Cross-listed
cs.LG,
hep-ex,
stat.ML
Citations
13
Venue
Machine Learning: Science and Technology
Last Checked
3 months ago
Abstract
Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 178M high $p_T$ jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-$Ξ±$ foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β hep-ph
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds
R.I.P.
π»
Ghosted
An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training
R.I.P.
π»
Ghosted
PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics
R.I.P.
π»
Ghosted
Stacking machine learning classifiers to identify Higgs bosons at the LHC
R.I.P.
π»
Ghosted
The Power of Genetic Algorithms: what remains of the pMSSM?
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted