This repository contains two training notebooks and notes showing how to pretrain from scratch, (or from imagenet weights) a self-supervised encoder that can be finetuned to solve this task. I also experimented with progressive resizing, and have released a pretrained encoder called mwalimu-128 for the 128px image size, that achieves a validation loss of 4.98 for xresnet34, and 4.35 for res2next50. Link is in the README:
Amazing work. Just a question, how do you do this supervised transform in ImageDataLoaders.from_lists in fastai?
Thank you @Koleshjr. I'm not sure I understand the question though, which supervised transform are you referring to?