📚 Hot Topic: Class Imbalance Struggle

Amini Cocoa Contamination Challenge

Helping Ghana

$7 000 USD

Completed (~1 year ago)

Skills you will learn

Computer Vision

Object Detection

932 joined

255 active

Info Data Chat Leaderboard

Start

Feb 14, 25

May 11, 25

Reveal

May 12, 25

Bone

Class Imbalance Struggle

Help · 21 Mar 2025, 16:10 · 16

Can someone be kind enough to suggest a successful strategy for dealing with the class imbalance? I have tried various strategies, upsampling, downsampling, augmentations but still not seeing any significant improvement.

Discussion 16 answers

CodeJoe

from ultralytics.data.build import YOLODataset
import ultralytics.data.build as build

class YOLOWeightedDataset(YOLODataset):
    def __init__(self, *args, mode="train", **kwargs):
        """
        Initialize the WeightedDataset.

        Args:
            class_weights (list or numpy array): A list or array of weights corresponding to each class.
        """

        super(YOLOWeightedDataset, self).__init__(*args, **kwargs)

        self.train_mode = "train" in self.prefix

        # You can also specify weights manually instead
        self.count_instances()
        class_weights = np.sum(self.counts) / self.counts

        # Aggregation function
        self.agg_func = np.mean

        self.class_weights = np.array(class_weights)
        self.weights = self.calculate_weights()
        self.probabilities = self.calculate_probabilities()

    def count_instances(self):
        """
        Count the number of instances per class

        Returns:
            dict: A dict containing the counts for each class.
        """
        self.counts = [0 for i in range(len(self.data["names"]))]
        for label in self.labels:
            cls = label['cls'].reshape(-1).astype(int)
            for id in cls:
                self.counts[id] += 1

        self.counts = np.array(self.counts)
        self.counts = np.where(self.counts == 0, 1, self.counts)

    def calculate_weights(self):
        """
        Calculate the aggregated weight for each label based on class weights.

        Returns:
            list: A list of aggregated weights corresponding to each label.
        """
        weights = []
        for label in self.labels:
            cls = label['cls'].reshape(-1).astype(int)

            # Give a default weight to background class
            if cls.size == 0:
              weights.append(1)
              continue

            # Take mean of weights
            # You can change this weight aggregation function to aggregate weights differently
            weight = self.agg_func(self.class_weights[cls])
            weights.append(weight)
        return weights

    def calculate_probabilities(self):
        """
        Calculate and store the sampling probabilities based on the weights.

        Returns:
            list: A list of sampling probabilities corresponding to each label.
        """
        total_weight = sum(self.weights)
        probabilities = [w / total_weight for w in self.weights]
        return probabilities

    def __getitem__(self, index):
        """
        Return transformed label information based on the sampled index.
        """
        # Don't use for validation
        if not self.train_mode:
            return self.transforms(self.get_image_and_label(index))
        else:
            index = np.random.choice(len(self.labels), p=self.probabilities)
            return self.transforms(self.get_image_and_label(index))

build.YOLODataset = YOLOWeightedDataset

2 May 2025, 09:31

Upvotes 0

CodeJoe

I hope this help

replied to CodeJoe2 May 2025, 09:32

Upvotes 0

Bone

Thank you, you are really helping. I tried balancing by upsampling and augmenting anthracnose and cssvd. that got me to 0.74. I can't seem to break that.

replied to CodeJoe3 May 2025, 18:42

Upvotes 1

CodeJoe

Did upsampling and augmentation better your CV?

replied to Bone3 May 2025, 19:14

Upvotes 0

Bone

There was mariginal improvement. I am yet to try your version. Did it improve your score?

replied to CodeJoe4 May 2025, 00:33

Upvotes 0

CodeJoe

It's a yes and no answer. It helped in some models and didn't help in other models. Let me be more specific it helped in some kind of splits and not other splits. I think I focused more on the model than the data. Will try that out. Thanks @Bone

replied to Bone4 May 2025, 00:50

Upvotes 0