Primary competition visual

CGIAR Root Volume Estimation Challenge

Helping Africa
$15 000 USD
Completed (~1 year ago)
Computer Vision
Prediction
1063 joined
257 active
Starti
Jan 24, 25
Closei
Mar 09, 25
Reveali
Mar 10, 25
User avatar
wizzard
2nd Place Solution : Keep it Simple
Notebooks · 11 Mar 2025, 19:06 · 17

Hi Everyone, I share my solution with you

For this challenge I worked locally and didn’t use any GPU. The whole code (train and inference) runs in one minute.

Given the small size of the dataset, I decided to use a very light model. The only features we used were (5 in total):

  • The aggregated mean over the categorical variables (Plant Number, Side, Genotype and Stage)
  • The width which is the difference between the proposed end and start layers.

The model is a simple xgboost trained using stratified folds on Genotype.

The environment is Python 3.10.13.

The code snippet is provided below:

import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold
import numpy as np
from tqdm import tqdm
import xgboost as xgb
import time
start = time.time()
CATCOLS = ["Stage","Genotype","Side","PlantNumber"]
NUMCOLS = ["Start","End","Delta"]
ID = "ID"
TGT = "RootVolume"
PATH = "Your Path Here..."
tr = pd.read_csv(f"{PATH}/Train.csv")
te = pd.read_csv(f"{PATH}/Test.csv")
tr["train"] = 1
te["train"] = 0
tr["Delta"] = tr["End"] - tr["Start"]
te["Delta"] = te["End"] - te["Start"]
data = pd.concat([tr,te])
data["x_geno"] = data.groupby("Genotype")[TGT].transform("mean")
data["x_plant"] = data.groupby("PlantNumber")[TGT].transform("mean")
data["x_stage"] = data.groupby("Stage")[TGT].transform("mean")
data["x_side"] = data.groupby("Side")[TGT].transform("mean")
data["x_bl"] = 0.8*data["x_geno"] + 0.2 * data["x_plant"]
tr = data.loc[data["train"]==1].copy().reset_index()
te = data.loc[data["train"]==0].copy().reset_index()
FE = ["Delta"] + ["x_geno","x_plant","x_stage","x_side"]
X = tr[FE].values
Xe = te[FE].values
y = tr[TGT].values
grp = tr["Genotype"].values
NFOLDS = 10
skf = StratifiedKFold(n_splits=NFOLDS)
FOLDS = list(skf.split(X,grp))
oof = np.zeros(y.shape)
pe = 0.0
for idx in range(NFOLDS):
tr_idx, val_idx = FOLDS[idx]
clf = xgb.XGBRegressor(max_depth=4, n_estimators=80, learning_rate=0.025) #
clf.fit(X[tr_idx],y[tr_idx])
oof[val_idx] = clf.predict(X[val_idx])
pe += clf.predict(Xe) / NFOLDS
print("FOLD :", idx)
#
oof = np.round(oof, 2)
CV = mean_squared_error(y, oof, squared=False)
print("Ridge CV:", CV)
sub = te[[ID]].copy()
sub[TGT] = pe
sub.to_csv(f"{PATH}/submission.csv", index=False)
end = time.time()
print(f"Elapsed Time: {end-start} seconds" )
Discussion 17 answers

Waouh !!! first of all thanks for sharing your approach but I'm just wondering if such kind of solution that does not use the images at all is still valuable to the client . This challenge is mainly a Computer Vision Challenge . "Can you estimate cassava root volume from underground scanning images? "

Anyway just wanted to share my opinion on this .

11 Mar 2025, 19:20
Upvotes 4
User avatar
wizzard

I think there is a signal in the images. because the delta features indirectly used the images. A bigger dataset could have allowed to really assess the importance of images

User avatar
MICADEE
LAHASCOM

@PUBG You just said my mind. Anyway let's see what clients will consider as their preferred solution. The onus is on them.

User avatar
offei_lad
University of mines and technology

This is also a concern of mine, however so far it's shown that predictions made using the images are inferior to those without ( most definitely due to the small sample size) so any model that incorporates the images would be less valuable to the client. And also, as @wizzard stated, the tabular data contains some features linked to the images and seem to yield better results. In the end though, the decision is left to Zindi and the clients.

User avatar
Ecole nationale supérieure d'ingénieurs de Tunis

thanks for sharing , nice work keeping it as simple as possible , maybe i misunderstood the chalenge as it appeared to be a computer vision or image processing but you have not used this and your model succeded so very cool work

11 Mar 2025, 19:23
Upvotes 0

@wizzard well played.

11 Mar 2025, 19:31
Upvotes 0
User avatar
nymfree

Amazing. one question: did you have strong conviction in the cross-validation scores you got with this approach? my teammate had such a model initially. but in the end it was difficult to justify selecting such a model given the cv of models with image features.

11 Mar 2025, 19:49
Upvotes 1
User avatar
wizzard

Not really, but I chose it thinking maybe it can be successful on the private data. The size of the dataset didn't reward big models which used the images.

User avatar
nymfree

your intuition paid off. congrats

User avatar
3B

Congratulations on your victory! Your solution is well deserved. However, it seems the host would be happier if an image-based solution ranked high 😂

12 Mar 2025, 04:30
Upvotes 3
User avatar
AJoel
Zindi

Hi @Wizzard,

First of all congratulations in taking part in the challenge. How does your solution take the images provided into account. Recall that the objective of the challenge was primarily to use computer vision techniques to estimate the volume from the images. So it is a two step process.

12 Mar 2025, 06:59
Upvotes 4
User avatar
wizzard

Hello @Joel. One of the challenge was to choose the appropriate depth range for the images. I tried many, but finally I stick with the proposed start and end layers in the metadata provided. My idea is simple, the more the delta (end - start) is high, the volume should be higher since the plant is assumed to take more space. So I only used that feature. We all experienced that models which used extensive image features (object detection, Feature map from conv nets, etc.) didn't generalize well on the private. Though it is legitimate to enforce the use of images, I will always stick with the more efficient model. In my case, I think that it uses images indirectly. To me, it is a predictive challenge, we should not forget that one of the primary focus is to generalize on unseen data.

User avatar
offei_lad
University of mines and technology

I think the flaw is from the structure of the competition, the size of the data meant that trying to increase your private scores when using computer vision techniques was effectively overfitting. Also there's no way a pure computer vision model would beat the baseline which would be predicting with the average volume per genotype. The only way it would do so is if the images were used with the tabular data and since it gives less scores than without it means it was effective noise. The question would the winning solution be given to the team who incorporated noise and were the best at ignoring it? Plus we were able to beat the baseline score because we added features that were linked to the images so although indirectly, the images were used.

Hi @AJoel,

my solutions are purely based radar data on left & right side image of plant and cv 0.90 MAE ( used mae score for generalization purpose since dataset is small ) , and i got 1.40042721 on lb ( 182'th rank )

maybe, problem is evalution metrics Root mean Squared error that sensitive to outliers

User avatar
wizzard

I agree, it should have been MAE since the beginning.

my model is very image based. LOL

That's an outside-the-box solution—appreciate you sharing it!

16 Mar 2025, 00:47
Upvotes 0