*Thanks to Zindi and GeoAI for organizing this competition!
I joined the competition in the last 14 days, and it was a great experience.
Preprocessing
1. Feature Engineering:
- Created features based on longitude and latitude.
- Engineered some combination features to capture interactions between variables.
2. Data Cleaning:
- Removed three towns where the mean monthly NO₂ emissions were significantly different from the others. These anomalies negatively impacted the model's performance.
3. Imputation:
- Used forward fill (`ffill`) and mean imputation to handle missing data.
Training
1. Model Ensembling:
- Used an ensemble of two tree-based models for predictions witha custom cross validation method to avoid the leak .
Postprocessing
1. Adjustment by Coefficients :
- Adjusted results for each town using specific coefficients.
- Applied different coefficients for predictions before and after 01-01-2020.
2. Handling Similar Towns:
- For towns with very close counterparts in the training set, assigned values directly based on those counterparts.
- Alternatively, created a model trained only on the specific town's data (date and target). This overfit model was used for predictions on similar nearby towns.
This was our approach in brief.
Anyway, this competition taught me an important rule: "No additional members may be added to teams within the final 5 days of the competition or the last hour of a hackathon."
In the last 7 days of the competition, I collaborated with a friend to create a better solution, and we decided to team up on the final day. However, we later realized that this violated the rule, and we were disqualified.
This was entirely our mistake, and we take responsibility for it.
I still wish to see the first-place solution, as it must be incredible!
Congratulations . What can you say was the key thing to get you to the 6 score??
The main two factors that created the gap between solutions were the adjustment by coefficients and the handling of similar towns. The adjustment by coefficients was the technique that made the CV score align with the leaderboard and allowed me to achieve a score of 6. , additionally, improving how I handled similar towns further boosted my solution, raising my score to 6.4 on the public leaderboard.
First of all thank you for the explanation , I learned a lot
But ,can you share your notebook please , I get most of what you explain but for example I don't visualize how you ajusted the results
Did you use another model to do so ?