Can someone be kind enough to suggest a successful strategy for dealing with the class imbalance? I have tried various strategies, upsampling, downsampling, augmentations but still not seeing any significant improvement.
from ultralytics.data.build import YOLODataset
import ultralytics.data.build as build
class YOLOWeightedDataset(YOLODataset):
def __init__(self, *args, mode="train", **kwargs):
"""
Initialize the WeightedDataset.
Args:
class_weights (list or numpy array): A list or array of weights corresponding to each class.
"""
super(YOLOWeightedDataset, self).__init__(*args, **kwargs)
self.train_mode = "train" in self.prefix
# You can also specify weights manually instead
self.count_instances()
class_weights = np.sum(self.counts) / self.counts
# Aggregation function
self.agg_func = np.mean
self.class_weights = np.array(class_weights)
self.weights = self.calculate_weights()
self.probabilities = self.calculate_probabilities()
def count_instances(self):
"""
Count the number of instances per class
Returns:
dict: A dict containing the counts for each class.
"""
self.counts = [0 for i in range(len(self.data["names"]))]
for label in self.labels:
cls = label['cls'].reshape(-1).astype(int)
for id in cls:
self.counts[id] += 1
self.counts = np.array(self.counts)
self.counts = np.where(self.counts == 0, 1, self.counts)
def calculate_weights(self):
"""
Calculate the aggregated weight for each label based on class weights.
Returns:
list: A list of aggregated weights corresponding to each label.
"""
weights = []
for label in self.labels:
cls = label['cls'].reshape(-1).astype(int)
# Give a default weight to background classif cls.size == 0:
weights.append(1)
continue# Take mean of weights# You can change this weight aggregation function to aggregate weights differently
weight = self.agg_func(self.class_weights[cls])
weights.append(weight)
return weights
def calculate_probabilities(self):
"""
Calculate and store the sampling probabilities based on the weights.
Returns:
list: A list of sampling probabilities corresponding to each label.
"""
total_weight = sum(self.weights)
probabilities = [w / total_weight for w in self.weights]
return probabilities
def __getitem__(self, index):
"""
Return transformed label information based on the sampled index.
"""
# Don't use for validationifnot self.train_mode:
return self.transforms(self.get_image_and_label(index))
else:
index = np.random.choice(len(self.labels), p=self.probabilities)
return self.transforms(self.get_image_and_label(index))
build.YOLODataset = YOLOWeightedDataset
Thank you, you are really helping. I tried balancing by upsampling and augmenting anthracnose and cssvd. that got me to 0.74. I can't seem to break that.
It's a yes and no answer. It helped in some models and didn't help in other models. Let me be more specific it helped in some kind of splits and not other splits. I think I focused more on the model than the data. Will try that out. Thanks @Bone
I hope this help
Thank you, you are really helping. I tried balancing by upsampling and augmenting anthracnose and cssvd. that got me to 0.74. I can't seem to break that.
Did upsampling and augmentation better your CV?
There was mariginal improvement. I am yet to try your version. Did it improve your score?
It's a yes and no answer. It helped in some models and didn't help in other models. Let me be more specific it helped in some kind of splits and not other splits. I think I focused more on the model than the data. Will try that out. Thanks @Bone
Before that, is your CV correlating with the LB?
They have never correlated in all my experiments.
Same problem here. I suggest you focus more on the model.
Sure. Hyperparameter tuning isn't helping much either.
Focus on inference too. Tune IOU, use lower confidence and set augment = True.
That was helpful. Thanks @CodeJoe
Nice you've pushed. Well done.
Thanks, buddy. I am now ensembling the models.
Have you done an ensemble yet?
Yes, using WBF. Moved my score up by at least 1