🌱 This Week on Zindi: test data

Wazihub Soil Moisture Prediction Challenge

Helping Senegal

$8 000 USD

Completed (over 6 years ago)

Skills you will learn

Forecast

919 joined

93 active

Info Data Chat Leaderboard

Start

Jul 29, 19

Oct 20, 19

Reveal

Oct 21, 19

Tarek_hamdi

test data

Data · 20 Oct 2019, 05:07 · 6

Hi ,

I can't get test data from the train, and by applying set on timestamp in submission, I find many repititeve timestamp, is there any code that can help me ?

Thanks

Discussion 6 answers

Olayinka_Fadahunsi

Hello Hamdi,

The test set is at the tail end of every field. Carefully observing the details in the info section and train data will put you in the right direction.

However, see starter code below in R. Hope it helps

#Convert timestamp to date and time in train, i.e. col timestamp3 below

Wazi <- cbind(Wazi,timestamp3 = strptime(Wazi$timestamp , format = "%Y-%m-%d %H:%M:%S", tz = "GMT"))

#You can seperate field1 to 4 into different dataframes, i.e. 4, by selecting the required columns

#Once you complete the step in the column below apply the code below to get train and test for field 1 to 4

#Get train and test for each field

#Field1

Field1 <- Field1[with(Field1, order(timestamp3)), ]

Field1_train <- Field1[Field1$timestamp3 >= "2019-02-23 00:00:00" & Field1$timestamp3 <= "2019-03-25 23:45:00",]

Field1_test <- Field1[Field1$timestamp3 > "2019-03-25 23:45:00" & Field1$timestamp3 <= "2019-03-29 23:50:00",]

#Field2

Field2 <- Field2[with(Field2, order(timestamp3)), ]

Field2_train <- Field2[Field2$timestamp3 >= "2019-02-23 00:00:00" & Field2$timestamp3 <= "2019-05-25 08:40:00",]

Field2_test <- Field2[Field2$timestamp3 > "2019-05-25 08:40:00" & Field2$timestamp3 <= "2019-05-31 10:15:00",]

#Field3

Field3 <- Field3[with(Field3, order(timestamp3)), ]

Field3_train <- Field3[Field3$timestamp3 >= "2019-02-23 00:00:00" & Field3$timestamp3 <= "2019-04-19 21:10:00",]

Field3_test <- Field3[Field3$timestamp3 > "2019-04-19 21:10:00" & Field3$timestamp3 <= "2019-04-23 21:15:00",]

#Field4

Field4 <- Field4[with(Field4, order(timestamp3)), ]

Field4_train <- Field4[Field4$timestamp3 >= "2019-02-23 00:00:00" & Field4$timestamp3 <= "2019-05-25 08:40:00",]

Field4_test <- Field4[Field4$timestamp3 > "2019-05-25 08:40:00" & Field4$timestamp3 <= "2019-05-31 08:45:00",]

20 Oct 2019, 07:58 (edited 1 minute later)

Upvotes 0

Adrianteri

I have noticed irregularities from my data splits ....I have rigid boundaries but some entries have values for soil humidity

Field1 Test timestamp boundaries START 2019-03-25 22:50:00 STOP 2019-03-29 22:50:00

e.g in Field1 Test

2019-03-26 07:35:00 has a value 41

2019-03-27 13:40:00 has a value 42

2019-03-29 19:50:00 has a value 46

The data ought to have been split when provided IMO....

replied to Olayinka_Fadahunsi20 Oct 2019, 12:50 (edited 1 minute later)

Upvotes 0

Olayinka_Fadahunsi

The test entries with values are the peak soil humidities as described in the info section.

replied to Adrianteri20 Oct 2019, 12:55

Upvotes 0

Tarek_hamdi

Thanks DrFad, by applying those bounderies I find that the lenghth of all test data is 5775, so there is 7 missing rows ? also how to prapare the sumbmission file for the 4 fields. Any code please ?

20 Oct 2019, 16:24

Upvotes 0

Olayinka_Fadahunsi

You are welcome. You will have to add them manually by creating new rows. Please look at the end of each field in the submission file and add manually

replied to Tarek_hamdi20 Oct 2019, 16:29

Upvotes 0

Tarek_hamdi

the times in train are wrong they gived me 23000 row in the second field, any help please ? what is the correct boundries for training in each field ? thanks

replied to Olayinka_Fadahunsi20 Oct 2019, 22:44

Upvotes 0

Join the largest network for
data scientists and AI builders

About FAQs

Status