Public data is permanently available here.
Track A
This contains the readme of Track A Track A/README.md
This contains the Agent Tool Server Track A/server.py
This contains an example of mock agent Track A/main.py
This contains the train dataset Track A/data/Phase_1/test.json
This contains the test dataset Track A/data/Phase_1/train.json
This contains an example of reasoning trace Track A/examples/traces.json
Track B
This contains the readme of Track B Track B/README.md
This contains the Agent Tool Server Track B/server.py
This contains the test dataset Track B/data/Phase_1/train.json
This contains an example of reasoning trace Track B/examples/traces.json
This contains an example of mock agent Track B/agent/
Server requirement files to unzip Track B/devices_outputs.zip
This challenge has two tracks and questions publicly released in the first two phases. During each of these two phases, you will submit one file that contains your agent(s) responses for all questions of the two tracks. Please read the instructions below carefully.
1. Submission Format
Your submission file (result.csv) must contain the following columns:
ID, Track A, Track B. Each row corresponds to a generated answer to a test question.
There are 550 questions in Phase 1 and 570 questions in Phase 2.
2. Two-Phase Question Release
3. Placeholder Values - What They Are & How to Use Them
To ensure fair and consistent evaluation, we will use placeholder values for Tracks you are not participating in.
If you're only competing in one of two tracks, you must keep the placeholder values exactly as they appear in the sample submission file for tracks you aren’t competing in.
Example of submission file:
scenario_id Track A Track B
80e3aa96-815d-4683-980c-16db42eab0ef C4
f55a819f-3fb9-4c8f-8859-a5b1649ff2d5 C7 ... 535afb0d-fa81-419b-9bcc-b456d032df5d Gamma-Aegis-01(Eth1/... 8ec59f8b-1a5a-4fb3-80ad-f0e2aaf6a499 Beta-Node-03->Gamma-A...
4. Final submission
For the final submission in Phase C, a zip file is required, which must include:
Join the largest network for
data scientists and AI builders