Primary competition visual

TAHMO Incoming Solar Radiation Prediction Challenge

$10 000 USD
Under code review
Prediction
Geospatial Analysis
1525 joined
760 active
Starti
Apr 01, 26
Enrolments closei
May 24, 26
Closei
May 24, 26
Reveali
May 24, 26
Is it really possible to achieve MBE<1 without multiple accounts?
24 May 2026, 18:00 · 13

Is it actually possible to achieve an MBE below 1 without using multiple accounts?

I ask because the answer is, quite simply, no.

Public leaderboard inevitably contain some degree of misalignment with respect to the underlying dataset. Lowering the MBE, starting from a model with a good RMSE, requires manually calibrating certain stations through post-processing based on the public leaderboard. This can be done to some extent, but how much can really be achieved with only 100 submissions available?

Can this kind of score genuinely be reached with a probabilistic model? Or is it the case that everyone with an MBE below 1, or with a leaderboard score above 0.6, has actually used multiple accounts and managed to cheat more effectively than others without getting banned?

I will let you reason on this and on the sens of this competition. It is frustrating starting an hard work and realising at a cetrain point that it is mandatory to cheat beacuse virtually all the accounts over you in leaderboard are doing it - ofc nobody is going to admit it, but I know you know it.

Think people, think.

PS1: just to let all of you remember who has not been banned

PS2: Oh, I am the still among you <3

Discussion 13 answers
User avatar
Brainiac

@Uiziwi Your premise isn't really holding up, and honestly it reads like it's coming from not enough experimentation rather than the math actually being against you.

A handful of submissions is genuinely all you need once your CV is doing the work. You sanity check the direction of the bias correction, confirm CV and LB are moving together, and you're done. That's it. And look at the board, plenty of people are already sitting comfortably under 1, so the "it's impossible" framing isn't really matching what's actually happening.

  • "Lowering MBE requires manually calibrating stations through the public leaderboard"The whole argument rests on this one hidden assumption, and it doesn't hold. You can estimate each station's bias from your own out-of-fold residuals. Unlimited samples, no submission budget, no probing involved.
  • "The public LB is misaligned with the underlying dataset"That one actually cuts against your argument, not for it. A noisy, misaligned board is a worse estimator of per-station bias than clean out-of-fold predictions are. It's the weaker thing to calibrate against, period.
  • "Only 100 submissions available, how much can really be achieved?"The ceiling only matters if you've made the board your tuning signal in the first place. If you haven't, it's basically irrelevant. You're using subs to confirm, not to search.
  • "Can this score genuinely be reached with a probabilistic model?"Yes it can. A model with solid RMSE can still carry small systematic per-station offsets from site, sensor, and altitude effects. RMSE barely reacts to a 1 or 2 W/m² mean offset, but MBE is dominated by it. Fix the offset where you can already see it, in CV, and the MBE drops without meaningfully hurting RMSE.
  • "Everyone with MBE below 1 has used multiple accounts and cheated"One legitimate path is enough to break that inference, and there is one. Concluding that everyone above you must be cheating isn't really reasoning, it's frustration talking.
24 May 2026, 18:38
Upvotes 4

You're being insencere here. You reached an almost perfect, 0 MBE at some point. If you are willing to be honest and say "I probed the leaderboard with 40+ submissions", I wouldn't judge you.. No multi-accounting needed for that.

But estimating each station's test-set biases from out-of-fold residuals is bullshit (getting 0 error for each station.. come on). As someone who probed the LB, I can confirm thay roughly half the stations have flipped signs compared to out-of-fold estimates.

Here's another interesting point: even probing won't get you to a perfect 0 MBE score. Not enough numerical precision in the Zindi score feedback to be able to nail it down to 8e-9 precision with only 40 submissions.

So yeah.. given that the accusation came from a cheating account.. they are right to flag you. You're not being honest.

Incorrect, my cheater friend @Brainiac

OOF residuals can help reduce MBE. Nobody is denying that. But they do not explain an MBE of 8e-9 on the public LB.

OOF estimates train-distribution station bias. The LB measures hidden test-distribution station bias. Those are not identical quantities. Random split or not, there is finite-sample noise, temporal mix noise, sensor noise, and refit noise.

So yes: getting MBE below 1 can be legitimate? You have to be "very lucky" without some external calibration. Getting it to numerical zero is a completely different claim.

Let's continue, because I am enjoying.

“Just use CV” is not an explanation. CV can give direction and scale. It cannot magically know the exact public station means to 8 decimal places. Also, saying “many people are under 1” proves nothing about how they got there. That is not evidence, it is leaderboard theology.

If this was really so easy, reproduce the 8e-9 MBE again with a fixed method, no extreme LB probing nor private label data leaking, and show the code.

Now the fake accounts are talking 😂 Mind you, this account was created 5 hours ago.

Reveal yourself, who are you?

24 May 2026, 18:47
Upvotes 0
User avatar
Brainiac

and they are bragging about it - PS2: Oh, I am the still among you <3

At this point we can all guess who this is

LOL. It is clear

Dear multi-accounter.. you kinda ruined a lot of people's fun here, so I will optimistically assume you're just young and wanted to prove a point. If you managed to set up this elaborate multi-accounting vpn + registration+ submission pipeline (I really hope you were not insane and you didn't do it manually), you're clearly smart enough to do better things in life. What's done is done, get it out of your system, but do better next time.

24 May 2026, 19:09
Upvotes 2
User avatar
Semaka_Mathunyane
University of South Africa

It's a sybil attack, I'm glad I didn't invest that much into this comp unlike the frog challenge

Unfortunately, fgbfgb, you are not talking to that guy. And I totally agree with you, but unfortunately I am not the one at that level... Here the ocean is so big and what happened is very extreme. You are pursuing a wrong assumption: there is one single cheater...

OK, but just to remain educational: do you now get how ~41 submissions would have been enough to probe the LB? Not to 8e-9 precision, sure. But easily down to below 1 MBE.

I would say "maybe". Let me explain.

Assuming that you have not used tens of submisisons for testing and validating on LB:

- all the possible datasets and features engineering combinations

- many different families of models, as well as losses, metrics and OOF strategies

- the correct balance of MBE and RMSE on training data

- a strong anchor

- and so on with their permutation

then yes, with 41 submissions you can get close to 1 of MBE if you are skilled and luky enough to probe sing and magnitude of the offset of each station in one shot.

Shall this be true for all the people with MBE<1? Even for people with MBE close to 0.01? I don't know Rick...

You are overthinking this. Let's separate the MBE from RMSE. Say you make a submission with all zeros, you note down the MBE score. Then you pick station1, set all its row values to 1350, all other stations are 0. This gives you another MBE score.. from the difference of the two submissions you get back the exact station1 MBE. Then repeat for all other stations=> 41 submissions is all you need.

This doesn't say anything about RMSE, but it does recover the exact per-station anchors.

You could also use Hadamard matrices, but still 41 submissions needed.

And again, I don't think doing this is necessarily wrong or against the rules (I guess the organizers will decide that). But pretending no such method was applied by the top scores in the leaderboard would be a lie. And if their submissions history shows no traces of probing.. then that's multi-accounting

User avatar
MICADEE
LAHASCOM (Freelance)

Hmmm....😊🤔

What worries me most is not even the cheating itself, but the normalization of leaderboard probing as if it were “machine learning skill.”

At some point, we need to be honest about the difference between:

  • building a robust model that generalizes, and
  • reverse-engineering the public leaderboard through adaptive submissions.

If the primary path to a top score becomes extracting hidden station offsets from LB feedback, then we are no longer evaluating modeling ability in the traditional ML sense. We are evaluating who can exploit the evaluation protocol more efficiently.

In real industry or production environments, this mindset becomes extremely dangerous.

Why?

Because production systems do not give you access to hidden target statistics. Real-world deployments require:

  • robustness under distribution shift,
  • reliable validation methodology,
  • uncertainty handling,
  • reproducibility,
  • and models that generalize beyond the validation environment.

A data scientist who overfits to leaderboard feedback is essentially training themselves to optimize for leakage instead of learning signal. That may work in a competition with weak evaluation protections, but it fails badly in production where future data distributions are unknown.

This is exactly why proper ML practice emphasizes:

  • strong cross-validation,
  • clean separation between train/validation/test,
  • prevention of feedback leakage,
  • and limited interaction with hidden evaluation data.

The moment leaderboard probing becomes the dominant strategy, the competition shifts away from science and toward exploitation mechanics.

And honestly, that hurts everyone:

  • newcomers lose motivation,
  • serious participants stop trusting the rankings,
  • and genuinely strong modeling work gets buried under submission engineering.

There is also a difference between:

  • using a few submissions to sanity-check generalization, and
  • systematically extracting hidden information from the public leaderboard.

Those are not the same thing.

The scary part is that some people start believing this is “normal ML.” It is not. In a proper ML pipeline, if your validation strategy cannot reproduce your leaderboard gains without adaptive probing, then your process is fundamentally fragile.

Competitions should reward:

  • feature engineering,
  • modeling creativity,
  • domain understanding,
  • calibration,
  • ensembling,
  • and robust validation.

Not who can turn the public LB into a side-channel oracle.

And beyond machine learning, there is also a life lesson here:

Not every win carries value, dignity, or prestige — especially when it comes through the back door. Not all money is valuable either, especially when it is dirty money.

Someone may see themselves as an achiever because of a ranking or prize, but outsiders often judge success differently. Real respect comes from integrity, skill, and fairness — not from exploiting loopholes while others are genuinely trying to learn and compete honestly.

That distinction matters.

Shalom !!!