The dataset is a bit tricky, however pretty straightforward. The challenge revolves round weather forecasting. In the train dataset, you have up to 30 columns because those columns are actually needed to infer certain insights from after proper data wrangling. That is why it is not present in the test dataset because it is believed that based on the knowledge and/or inference made, we only need the month, year and gender to be able to make accurate prediction. So I suggest that we focus more of the data processing steps. One can decide to concat all the similar features in the train set, divide by the number of month to get average_per_month selling etc like that
+++
The dataset is a bit tricky, however pretty straightforward. The challenge revolves round weather forecasting. In the train dataset, you have up to 30 columns because those columns are actually needed to infer certain insights from after proper data wrangling. That is why it is not present in the test dataset because it is believed that based on the knowledge and/or inference made, we only need the month, year and gender to be able to make accurate prediction. So I suggest that we focus more of the data processing steps. One can decide to concat all the similar features in the train set, divide by the number of month to get average_per_month selling etc like that