Primary competition visual

Amini Cocoa Contamination Challenge

Helping Ghana
$7 000 USD
Completed (11 months ago)
Computer Vision
Object Detection
928 joined
255 active
Starti
Feb 14, 25
Closei
May 11, 25
Reveali
May 12, 25
Class Imbalance Struggle
Help · 21 Mar 2025, 16:10 · 16

Can someone be kind enough to suggest a successful strategy for dealing with the class imbalance? I have tried various strategies, upsampling, downsampling, augmentations but still not seeing any significant improvement.

Discussion 16 answers
User avatar
CodeJoe
from ultralytics.data.build import YOLODataset
import ultralytics.data.build as build
class YOLOWeightedDataset(YOLODataset):
    def __init__(self, *args, mode="train", **kwargs):
        """
        Initialize the WeightedDataset.

        Args:
            class_weights (list or numpy array): A list or array of weights corresponding to each class.
        """

        super(YOLOWeightedDataset, self).__init__(*args, **kwargs)

        self.train_mode = "train" in self.prefix

        # You can also specify weights manually instead
        self.count_instances()
        class_weights = np.sum(self.counts) / self.counts

        # Aggregation function
        self.agg_func = np.mean

        self.class_weights = np.array(class_weights)
        self.weights = self.calculate_weights()
        self.probabilities = self.calculate_probabilities()

    def count_instances(self):
        """
        Count the number of instances per class

        Returns:
            dict: A dict containing the counts for each class.
        """
        self.counts = [0 for i in range(len(self.data["names"]))]
        for label in self.labels:
            cls = label['cls'].reshape(-1).astype(int)
            for id in cls:
                self.counts[id] += 1

        self.counts = np.array(self.counts)
        self.counts = np.where(self.counts == 0, 1, self.counts)

    def calculate_weights(self):
        """
        Calculate the aggregated weight for each label based on class weights.

        Returns:
            list: A list of aggregated weights corresponding to each label.
        """
        weights = []
        for label in self.labels:
            cls = label['cls'].reshape(-1).astype(int)

            # Give a default weight to background class
            if cls.size == 0:
              weights.append(1)
              continue

            # Take mean of weights
            # You can change this weight aggregation function to aggregate weights differently
            weight = self.agg_func(self.class_weights[cls])
            weights.append(weight)
        return weights

    def calculate_probabilities(self):
        """
        Calculate and store the sampling probabilities based on the weights.

        Returns:
            list: A list of sampling probabilities corresponding to each label.
        """
        total_weight = sum(self.weights)
        probabilities = [w / total_weight for w in self.weights]
        return probabilities

    def __getitem__(self, index):
        """
        Return transformed label information based on the sampled index.
        """
        # Don't use for validation
        if not self.train_mode:
            return self.transforms(self.get_image_and_label(index))
        else:
            index = np.random.choice(len(self.labels), p=self.probabilities)
            return self.transforms(self.get_image_and_label(index))

build.YOLODataset = YOLOWeightedDataset
2 May 2025, 09:31
Upvotes 0
User avatar
CodeJoe

I hope this help

Thank you, you are really helping. I tried balancing by upsampling and augmenting anthracnose and cssvd. that got me to 0.74. I can't seem to break that.

User avatar
CodeJoe

Did upsampling and augmentation better your CV?

There was mariginal improvement. I am yet to try your version. Did it improve your score?

User avatar
CodeJoe

It's a yes and no answer. It helped in some models and didn't help in other models. Let me be more specific it helped in some kind of splits and not other splits. I think I focused more on the model than the data. Will try that out. Thanks @Bone

User avatar
CodeJoe

Before that, is your CV correlating with the LB?

They have never correlated in all my experiments.

User avatar
CodeJoe

Same problem here. I suggest you focus more on the model.

Sure. Hyperparameter tuning isn't helping much either.

User avatar
CodeJoe

Focus on inference too. Tune IOU, use lower confidence and set augment = True.

That was helpful. Thanks @CodeJoe

User avatar
CodeJoe

Nice you've pushed. Well done.

Thanks, buddy. I am now ensembling the models.

User avatar
CodeJoe

Have you done an ensemble yet?

Yes, using WBF. Moved my score up by at least 1