The training dataset comprises of 350 independent simulated events (collisions). Where each event contains approximately 3,000 labeled images of different particle trajectories passing through many detectors resulting from the collision. The events were simulated with ACTS in the context of the TRACKML challenge and were modified to target not particle tracking but rather particle identification.
If you are curious to learn about the original format of the dataset (which has also geometry and clusters information), checkout the dataset description and files here (you have to sign in) : https://competitions.codalab.org/competitions/20112#participate-get-data
This is the multiclass classification computer vision problem to identify particles by five types, labeled as follows:
- 11: "electron"
- 13: "muon"
- 211: "pion"
- 321: "kaon"
- 2212: "proton"
Fig 1 Transverse plane of the TrackML detector with the particle in red
Fig 2 Translated particle with RZ binning
Files available for download
- The training data consists of 350 .pkl files, each representing a unique event (or collision). Each .pkl file contains two columns: Column 1 is a list of 10x10 images. Column 2 is the particle type associated to the image (int). Training data can be downloaded at: https://cernbox.cern.ch/index.php/s/OH9tOo8VHYpHJDl. This data is open-source data.
Note that the training set is highly imbalanced, but the test set has been designed to be balanced.