The training dataset comprises of 350 independent simulated events (collisions). Where each event contains approximately 3,000 labeled images of different particle trajectories passing through many detectors resulting from the collision. The events were simulated with ACTS in the context of the TRACKML challenge and were modified to target not particle tracking but rather particle identification.
If you are curious to learn about the original format of the dataset (which has also geometry and clusters information), checkout the dataset description and files here (you have to sign in) : https://competitions.codalab.org/competitions/20112#participate-get-data
This is the multiclass classification computer vision problem to identify particles by five types, labeled as follows:
Fig 1 Transverse plane of the TrackML detector with the particle in red
Fig 2 Translated particle with RZ binning
Files available for download
Note that the training set is highly imbalanced, but the test set has been designed to be balanced.
Join the largest network for
data scientists and AI builders