🚜 Let's Talk About: One Hot Encoding of Categorica...

Digital Green Crop Yield Estimate Challenge

Helping India

€9 400 EUR

Completed (over 2 years ago)

Skills you will learn

Prediction

1370 joined

677 active

Info Data Chat Leaderboard

Start

Sep 04, 23

Dec 03, 23

Reveal

Dec 03, 23

ysnreddy

One Hot Encoding of Categorical Data

Help · 27 Nov 2023, 19:04 · 2

I am very new to data science so please bear with this noob question. I applied one hot encoding for the categorical variables. for the variable "LandPreparationMethod", no of unique values are 43 mean, I will get 43 extra features. Now for the text data, when i apply the same method we get no of variables as 30 and 30 extra features. When i tried to do the prediction on the test data, the error says, number of columns on train and test data doesn't match(basically it says model is expecting 13 more features). How to deal with this ?

Discussion 2 answers

yanteixeira

When training a model on a dataset, it's essential to ensure that the unseen data (test data) will have the same columns as the training data. In your case, the categorical variables in your training and test datasets have different unique values.

You need to decide on a method to resolve this mismatch. One approach is to add missing features to the test dataset, filling them with zeros for features that are present in the training dataset but absent in the test dataset, and vice versa.

27 Nov 2023, 19:24

Upvotes 0

GeorgeZindi

IN CONTEXT OF ONE-HOT ENCODING

Make sure the columns you have selected in the train dataset have :

1. One same data type for the whole dataset .

2. number of columns in Train dataset is "n" columns and the number of columns in Test is "n-1" columns.

27 Nov 2023, 19:28

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status