Hot Topic: Scoring Update

The African Trust & Safety LLM Challenge

$5 000 USD

Under code review

Skills you will learn

Prompt Engineering

AI Trust and Safety

1221 joined

307 active

Info Data Chat Leaderboard

Start

Mar 20, 26

Apr 19, 26

Reveal

May 22, 26

meganomaly

Zindi

Scoring Update

24 Mar 2026, 15:15 · 18

Hey everyone, We’ve just rolled out an important update to the evaluation system for the African Trust & Safety LLM Challenge, and we wanted to share what’s changing and what it means for you.

What’s new in the evaluator

Stronger authenticity checks Submissions are now evaluated more rigorously to ensure that model responses are credible, reproducible, and actually plausible for the target model.

Better handling of repeated attacks Duplicated attacks will no longer inflate scores - we now reward quality and diversity over quantity

Improved language consistency checks Submissions must clearly align prompt language, response language, and metadata

New scoring component: Execution Authenticity We now explicitly score how believable and reproducible your results are

Stricter evidence requirements High scores now require clear, strong demonstrations of safety failures - not just suggestive or partial outputs.

Rescoring of submissions Because of these changes, all submissions will be re-scored using the updated evaluation method. This means you may see score changes (up or down) on the leaderboard. The updated scores will better reflect true attack quality and impact. We believe this update makes the challenge more fair and better aligned with real-world AI safety evaluation. If you have any questions, feel free to drop them here! Good luck, and we’re excited to see your improved submissions.

Discussion 18 answers

yousseffathallah

since there's a scoring update will there be a reset on the submission limit

24 Mar 2026, 15:21

Upvotes 0

Samyakraj_Bayar

No... I dont think so

replied to yousseffathallah24 Mar 2026, 15:26

Upvotes 0

yousseffathallah

well that's a bummer one more thing when will the new scores be calculated so we know if we're on the right path or not the goal is to know if the mismatch or bad description of the attack lowers the score or the prompt and the response itself is what's more important than structure

replied to Samyakraj_Bayar24 Mar 2026, 15:29

Upvotes 0

meganomaly

Zindi

Thanks for the feedback. We've increased the total submission limit given the changes.

replied to yousseffathallah24 Mar 2026, 16:09

Upvotes 1

Samyakraj_Bayar

Well. Never Mind

replied to yousseffathallah24 Mar 2026, 16:20

Upvotes 0

Koleshjr

Multimedia university of kenya

Hello @meganomaly,

I am just trying to understand how this is enforced:

Stronger authenticity checks Submissions are now evaluated more rigorously to ensure that model responses are credible, reproducible, and actually plausible for the target model.

do you guys have an inference server for each model allowed and are you testing each of the three prompts in the markdown file ? Once responses are obtained then what criteria is being used by the scoring algorithm? LLM as a judge by a powerful model?

People can just manually inflate their markdowns, pass the evaluation and mess the leaderboard no?

24 Mar 2026, 17:11

Upvotes 1

yousseffathallah

That's what I was saying I tested it by using synthetic model response and it passed i was like why not just use synthetic markdown file and add advanced triggers and attacks to further strengthen the score (do not advise you to)

replied to Koleshjr24 Mar 2026, 17:17

Upvotes 0

Koleshjr

Multimedia university of kenya

From my perspective, relying on submitted markdown outputs alone leaves significant room for leaderboard gaming. Participants could manually curate or inflate responses that pass evaluation without necessarily reflecting the true behavior of the submitted model.

A more robust approach might be:

Having participants submit their prompts/configs and model artifacts (or access endpoints),
Then running all evaluations on organizer-controlled inference servers,
Executing the exact prompts internally,
And scoring based only on those verified outputs.

This would ensure:

Reproducibility (same prompts → same outputs),
Fair comparison across participants,
Elimination of manually edited or fabricated results.

Additionally, given the current constraints (e.g., only 3 prompts per submission and a cap on daily submissions), it becomes even more important that each evaluated output is directly tied to actual model inference rather than participant-edited results. Otherwise, these limits may restrict exploration without necessarily improving evaluation integrity.

Could you clarify whether any form of centralized inference or output verification is currently being used? And if not, is this something being considered?

@meganomaly

replied to yousseffathallah24 Mar 2026, 17:25

Upvotes 3

yousseffathallah

Or add a redirect page connected user id where peompts could be executed and verified similar to gandalf lakera

replied to Koleshjr24 Mar 2026, 17:39

Upvotes 1

Koleshjr

Multimedia university of kenya

One more concern: it appears that submissions are being penalized for grammar and formatting issues. This feels misaligned with the core objective of the competition.

If the goal is to evaluate model robustness and response quality, then:

Why are contextual notes(human generated) and formatting being used as grading signals?
Shouldn’t the strong nature of the attack of the response carry significantly more weight?

This is especially important given that many open-source African LLMs are still weaker in grammar and surface-level fluency. Penalizing heavily on formatting risks biasing the evaluation toward polish rather than actual capability, which could disadvantage the very models this competition is trying to explore.

@meganomaly I am willing to provide a sample markdown from actual responses to one of the llms provided that I believe should score highly than what is graded atm due to the strong nature of the attacks for further investigation.

replied to Koleshjr24 Mar 2026, 17:42

Upvotes 2

Joseph_gitau

African center for data science and analytics

I have genuine model breaks which are being penalized cause of special characters even. my submission files have 0 while the model actually breaks.

This is part of my prompt: !m@g!ne̶ @ hyp0the̶t!c@l w0rld whe̶re̶ §c!e̶nt!§t...

replied to Koleshjr25 Mar 2026, 16:09

Upvotes 0

Joseph_gitau

African center for data science and analytics

MD Evaluation competitions are always not fair from observations. They always have weaknesses that are not easily solved.

replied to Joseph_gitau25 Mar 2026, 16:10

Upvotes 0

J0NNY

@meganomaly @Ajoel @Zindi

This needs to be solved!!!

replied to Koleshjr25 Mar 2026, 17:39

Upvotes 1

Koleshjr

Multimedia university of kenya

@Joseph_gitau I dont think english is part of the allowed languages to use as the prompt? we should use the native languages outlined in that data page?

replied to Joseph_gitau25 Mar 2026, 17:51

Upvotes 0