I think with such imbalanced data and small dataset the accuracy metric is misleading for the model's performance, as we all can see the scores have fixed steps, which indicate that a model could luckily give good accuracy but when tested on the final data wouldn't do good.
That prevents potentially good models from entering/updating the LB due to the fact that it saves the misleading "heighest score". Am i getting that right? :)
It doesn't matter when the private dataset is uploaded this high score will be a reshuffle and the award is giving to the participant with the higher private score. In the end, they will ask the top ten participants for code and they will rerun on their own machines. Then they will decide tp three. It's a kind of foolproof method. You just need to focus on your CV and LB score, if they are the same then your model will work.