Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle Physics

December 13, 2024 Β· Declared Dead Β· πŸ› Machine Learning: Science and Technology

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Oz Amram, Luca Anzalone, Joschka Birk, Darius A. Faroughy, Anna Hallin, Gregor Kasieczka, Michael KrΓ€mer, Ian Pang, Humberto Reyes-Gonzalez, David Shih arXiv ID 2412.10504 Category hep-ph Cross-listed cs.LG, hep-ex, stat.ML Citations 13 Venue Machine Learning: Science and Technology Last Checked 3 months ago
Abstract
Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 178M high $p_T$ jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-$Ξ±$ foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” hep-ph

Died the same way β€” πŸ‘» Ghosted