🚜 AI in Focus: 3rd Place – Zindi Challenge

Ghana’s Indigenous Intel Challenge [BEGINNERS ONLY]

Helping Ghana, Algeria
and 53 other countries

Ghana
Algeria
Angola
Benin
Botswana
Burkina Faso
Burundi
Cameroon
Cabo Verde
Central African Republic
Chad
Comoros
Congo (Republic of the)
Congo (Democratic Republic of the)
Djibouti
Egypt
Equatorial Guinea
Eritrea
Eswatini
Ethiopia
Gabon
Gambia
Guinea
Guinea-Bissau
Côte d'Ivoire
Kenya
Lesotho
Liberia
Libya
Madagascar
Malawi
Mali
Mauritania
Mauritius
Morocco
Mozambique
Namibia
Niger
Nigeria
Rwanda
Sao Tome and Principe
Senegal
Seychelles
Sierra Leone
Somalia
South Sudan
South Africa
Sudan
Tanzania
United Republic of
Togo
Tunisia
Uganda
Zambia
Zimbabwe
Scroll to see more

$2 500 USD

Challenge completed ~2 months ago

Skills you will learn

Prediction

910 joined

565 active

Info Data Chat Leaderboard

Start

Aug 14, 25

Oct 12, 25

Reveal

Oct 12, 25

IamIman

Tech4Dev

3rd Place – Zindi Challenge

Help · 5 Nov 2025, 12:56 · 6

Hi all and thank you to Zindi and everyone. My solution to the challenge was built on a simple but important insight: traditional rainfall prediction is not just about meteorology—it’s about people, patterns, and place. By treating the problem as one of behavioural modelling, I built a solution that didn’t just predict rainfall—it understood the cultural logic behind those predictions.

Approach The dataset presented a severe class imbalance: 88% of all entries were “NORAIN”, with the remaining three categories making up just 12%. Rather than treating this imbalance as noise, I treated it as signal. It reflected real-world behaviour: farmers naturally make more “no rain” predictions. This shaped my modelling strategy from the outset.

I focused on features that captured who was predicting, where, and when:

User behaviour: Some farmers were 10× more active than others.
Geographic specificity: Cleaned and standardized over 20 Ghanaian community names.
Temporal rhythms: Rainfall intensity peaked midweek, especially on Wednesdays.
Indigenous indicators: Lightning correlated with medium rain; sun/heat often preceded heavy rain.

This approach allowed the model to learn from behavioural and spatial patterns embedded in traditional forecasting practices.

Model / Code I compared three tree-based ensemble classifiers:

XGBoost: Best balance between accuracy and generalisation using scale_pos_weight.
LightGBM: Fast and efficient, with native support for categorical features.
CatBoost: Ideal for categorical-heavy datasets like community and user ID.

All models were evaluated using stratified cross-validation and macro/weighted F1 scores. XGBoost emerged as the most stable and generalisable.

Final training setup:

from xgboost import XGBClassifier

Encode target labels

y_full_numeric = label_encoder.transform(y) X_features = X

Final model configuration

final_model = XGBClassifier( n_estimators=200, max_depth=5, learning_rate=0.1, random_state=42, n_jobs=-1, eval_metric='mlogloss', use_label_encoder=False, scale_pos_weight=scale_weights )

Train on full dataset

final_model.fit(X_features, y_full_numeric)

Evaluation The model performed reliably in identifying non-rain events but struggled with light rain. A modest 2.6% gap between training and cross-validation F1 scores indicated strong generalisation and minimal overfitting.

Using SHAP and LIME, I analysed what the model had truly learned:

Global insights: User history and location were dominant predictors.
Feature effects: Time features (hour, day) and most communities increased rainfall predictions. User ID and forecast length tended to decrease them. Clouds had a positive influence, but geographic and temporal context dominated.

This confirmed the model wasn’t just forecasting rain—it was decoding the behavioural logic of Ghanaian farmers.

This was an initial model designed to establish a clean, interpretable baseline. I did not apply any model enhancements like hyperparameter tuning or model stacking.

Thanks again to Zindi and the community. Hope this helps.

Discussion 6 answers

CodeJoe

Thank you for sharing and congratulations once again!

5 Nov 2025, 12:58

Upvotes 1

RareGem

Congratulations 🎊 . Thank you for sharing. Please, how did you handled the missing values?

5 Nov 2025, 13:07

Upvotes 1

CodeJoe

No need to, if you are using gradient boosting.

replied to RareGem5 Nov 2025, 14:30

Upvotes 0

IamIman

Tech4Dev

I dropped the 'time_observed' and 'indicator_description' columns they had too many missing values. For the 'indicator' column, I filled missing values with a constant ('unknown'), considering it's a challenge to predict rain using traditional methods, and knowing indicators like clouds are good predictors. Given the target has more instances of no rain events, filling with 'unknown' in a way supports the no rain cases.

replied to RareGem11 Nov 2025, 11:08

Upvotes 0

Joseph_gitau

African center for data science and analytics

Thank you for sharing and congratulations.

5 Nov 2025, 13:20

Upvotes 0

MacGee

thumbs up

5 Nov 2025, 19:43

Upvotes 1

Join the largest network for
data scientists and AI builders

About FAQs

Status