Hi everyone, after some exploratory data analysis, I have found the followings:
- The "District" column in the test set contains 'Kumurambheem Asifabad', 'Sangareddy' which are not present in the train set
- The "Sub-District" column in the test set contains
'Adavidevulapally', 'Addakal', 'Adilabad Urban', 'Bibipet', 'Chinnagudur', 'Doulthabad', 'Gandeed', 'Gundlapally', 'Gundmal', 'Jadcherla', 'Jainoor', 'Kerameri', 'Kesamudram', 'Kethepally', 'Kuntala', 'Maddur', 'Mahabubabad', 'Malegaon', 'Marriguda', 'Munugode', 'Nakrekal', 'Nampally', 'Narnoor', 'Neredugommu', 'Nirmal Rural', 'Nizampet', 'Papannapet', 'Parvathagiri', 'Ponkal', 'Sarangapur', 'Sathnala', 'Talamadugu', 'Tekmal', 'Toopran', 'Vatpally'
which are not present in the train set
- The "CNext" column in the test set contains "cotton" which are not present in the train set, although this column is probably related to the "CLast" column, so I'm not sure if this counts or not.
Do you guys think this might have an effect on the results ? Thank you.
@Zindi Can you please take a look, thank you.
it's intentional, Your cv should reflect that