Primary competition visual

Ghana’s Indigenous Intel Challenge [BEGINNERS ONLY]

Helping Ghana, Algeria
and 53 other countries
  • Ghana
  • Algeria
  • Angola
  • Benin
  • Botswana
  • Burkina Faso
  • Burundi
  • Cameroon
  • Cabo Verde
  • Central African Republic
  • Chad
  • Comoros
  • Congo (Republic of the)
  • Congo (Democratic Republic of the)
  • Djibouti
  • Egypt
  • Equatorial Guinea
  • Eritrea
  • Eswatini
  • Ethiopia
  • Gabon
  • Gambia
  • Guinea
  • Guinea-Bissau
  • CĂ´te d'Ivoire
  • Kenya
  • Lesotho
  • Liberia
  • Libya
  • Madagascar
  • Malawi
  • Mali
  • Mauritania
  • Mauritius
  • Morocco
  • Mozambique
  • Namibia
  • Niger
  • Nigeria
  • Rwanda
  • Sao Tome and Principe
  • Senegal
  • Seychelles
  • Sierra Leone
  • Somalia
  • South Sudan
  • South Africa
  • Sudan
  • Tanzania
  • United Republic of
  • Togo
  • Tunisia
  • Uganda
  • Zambia
  • Zimbabwe
  • Scroll to see more
$2 500 USD
Challenge completed ~2 months ago
Prediction
910 joined
565 active
Starti
Aug 14, 25
Closei
Oct 12, 25
Reveali
Oct 12, 25
User avatar
IamIman
Tech4Dev
3rd Place – Zindi Challenge
Help · 5 Nov 2025, 12:56 · 6

Hi all and thank you to Zindi and everyone. My solution to the challenge was built on a simple but important insight: traditional rainfall prediction is not just about meteorology—it’s about people, patterns, and place. By treating the problem as one of behavioural modelling, I built a solution that didn’t just predict rainfall—it understood the cultural logic behind those predictions.

Approach The dataset presented a severe class imbalance: 88% of all entries were “NORAIN”, with the remaining three categories making up just 12%. Rather than treating this imbalance as noise, I treated it as signal. It reflected real-world behaviour: farmers naturally make more “no rain” predictions. This shaped my modelling strategy from the outset.

I focused on features that captured who was predicting, where, and when:

  • User behaviour: Some farmers were 10× more active than others.
  • Geographic specificity: Cleaned and standardized over 20 Ghanaian community names.
  • Temporal rhythms: Rainfall intensity peaked midweek, especially on Wednesdays.
  • Indigenous indicators: Lightning correlated with medium rain; sun/heat often preceded heavy rain.

This approach allowed the model to learn from behavioural and spatial patterns embedded in traditional forecasting practices.

Model / Code I compared three tree-based ensemble classifiers:

  • XGBoost: Best balance between accuracy and generalisation using scale_pos_weight.
  • LightGBM: Fast and efficient, with native support for categorical features.
  • CatBoost: Ideal for categorical-heavy datasets like community and user ID.

All models were evaluated using stratified cross-validation and macro/weighted F1 scores. XGBoost emerged as the most stable and generalisable.

Final training setup:

from xgboost import XGBClassifier

Encode target labels

y_full_numeric = label_encoder.transform(y) X_features = X

Final model configuration

final_model = XGBClassifier( n_estimators=200, max_depth=5, learning_rate=0.1, random_state=42, n_jobs=-1, eval_metric='mlogloss', use_label_encoder=False, scale_pos_weight=scale_weights )

Train on full dataset

final_model.fit(X_features, y_full_numeric)

Evaluation The model performed reliably in identifying non-rain events but struggled with light rain. A modest 2.6% gap between training and cross-validation F1 scores indicated strong generalisation and minimal overfitting.

Using SHAP and LIME, I analysed what the model had truly learned:

  • Global insights: User history and location were dominant predictors.
  • Feature effects: Time features (hour, day) and most communities increased rainfall predictions. User ID and forecast length tended to decrease them. Clouds had a positive influence, but geographic and temporal context dominated.

This confirmed the model wasn’t just forecasting rain—it was decoding the behavioural logic of Ghanaian farmers.

This was an initial model designed to establish a clean, interpretable baseline. I did not apply any model enhancements like hyperparameter tuning or model stacking.

Thanks again to Zindi and the community. Hope this helps.

Discussion 6 answers
User avatar
CodeJoe

Thank you for sharing and congratulations once again!

5 Nov 2025, 12:58
Upvotes 1
User avatar
RareGem

Congratulations 🎊 . Thank you for sharing. Please, how did you handled the missing values?

5 Nov 2025, 13:07
Upvotes 1
User avatar
CodeJoe

No need to, if you are using gradient boosting.

User avatar
IamIman
Tech4Dev

I dropped the 'time_observed' and 'indicator_description' columns they had too many missing values. For the 'indicator' column, I filled missing values with a constant ('unknown'), considering it's a challenge to predict rain using traditional methods, and knowing indicators like clouds are good predictors. Given the target has more instances of no rain events, filling with 'unknown' in a way supports the no rain cases.

User avatar
Joseph_gitau
African center for data science and analytics

Thank you for sharing and congratulations.

5 Nov 2025, 13:20
Upvotes 0
User avatar
MacGee

thumbs up

5 Nov 2025, 19:43
Upvotes 1