Primary competition visual

The African Trust & Safety LLM Challenge

$5 000 USD
Under code review
Prompt Engineering
AI Trust and Safety
1213 joined
295 active
Starti
Mar 20, 26
Closei
Apr 19, 26
Reveali
May 29, 26
User avatar
meganomaly
Zindi
Final Scores Are Live 🙌 Congrats to the winners!
29 May 2026, 14:00 · 0

Hi everyone 👋

We’ve now completed the final evaluation pipeline for the African Trust & Safety LLM Challenge, and we wanted to share more detail on how submissions were evaluated for both the leaderboard and the final benchmark dataset.

The challenge received an incredible response from the community: • 42,329 total attacks submitted • 4,010 markdown submission files • 320 participants • 307 contributors represented in the final benchmark

How evaluation worked

The evaluation pipeline was designed around four core principles:

  • Quality over quantity
  • Reproducibility
  • Diversity of attacks
  • Fairness across participants and teams

Step 1 - Validation

Submissions first had to pass some validation and taxonomy checks: • valid structure • supported language labels • supported target models • complete metadata

Invalid or incomplete attacks were removed at this stage.

Step 2 - Deduplication

A major focus of the evaluation was preventing leaderboard inflation through repeated or templated attacks.

We used multilingual semantic similarity models to identify: • repeated attacks within the same submission • near-identical variants across many files • copied sample-template attacks

This removed thousands of duplicate or minimally modified prompts.

Importantly: • the benchmark removes cross-participant duplicates entirely • the leaderboard still gave credit for independently created attacks

Step 3 - Quality scoring

Every valid attack was evaluated by multiple independent LLM judges using a 20-point rubric across: • attack validity • evidence of model failure • classification accuracy • non-triviality / cultural specificity

The judge stack included Aya Expanse, Qwen 2.5, and Claude Sonnet as a tie-breaker for disagreements.

Step 4 - Reproducibility testing

Attacks then had to reproduce consistently under controlled evaluation settings.

This was critical: a prompt only counted if the target model reliably reproduced the harmful or unsafe behaviour.

Silent refusals, broken generations, or non-reproducing prompts did not pass into the benchmark.

Step 5 - Final participant selection

Many participants submitted multiple revisions over time.

To keep the leaderboard fair, each participant or registered team contributed only ONE final submission to scoring: • the system selected the strongest submission automatically • repeated uploads alone did not improve ranking

This ensured the competition rewarded quality rather than volume.

Final scoring formula

Final leaderboard scores combined: • Quality (55%) • Diversity (20%) • Reproducibility (15%) • Effort (10%)

Importantly, diversity mattered. Participants who explored multiple languages, attack types, and risk categories scored better than narrow repetitive attacks.

Final outputs

Two major outputs came from the challenge:

🏆 Leaderboard Ranks participants and teams based on their strongest evaluated submission.

📚 African Trust & Safety Benchmark A curated benchmark of 4,216 verified reproducible attacks across African languages and contexts.

The benchmark includes: • 18 African languages + multilingual/code-switched variants • 39 attack techniques • 16 risk categories • contributions from 307 participants

This benchmark will help advance multilingual AI safety evaluation globally, particularly for African languages and contexts that are often underrepresented in existing safety datasets.

Huge thanks again to everyone who participated and helped make this challenge possible 🙌

Discussion 0 answers