Wazihub Soil Moisture Prediction Challenge
$8,000 USD
Predict soil humidity using sensor data from low-cost DIY Internet of Things in Senegal
29 July–20 October 2019 23:59
696 data scientists enrolled, 96 on the leaderboard
test data
published 20 Oct 2019, 05:07

Hi ,

I can't get test data from the train, and by applying set on timestamp in submission, I find many repititeve timestamp, is there any code that can help me ?

Thanks

edited 1 minute later

Hello Hamdi,

The test set is at the tail end of every field. Carefully observing the details in the info section and train data will put you in the right direction.

However, see starter code below in R. Hope it helps

#Convert timestamp to date and time in train, i.e. col timestamp3 below

Wazi <- cbind(Wazi,timestamp3 = strptime(Wazi$timestamp , format = "%Y-%m-%d %H:%M:%S", tz = "GMT"))

#You can seperate field1 to 4 into different dataframes, i.e. 4, by selecting the required columns

#Once you complete the step in the column below apply the code below to get train and test for field 1 to 4

#Get train and test for each field

#Field1

Field1 <- Field1[with(Field1, order(timestamp3)), ]

Field1_train <- Field1[Field1$timestamp3 >= "2019-02-23 00:00:00" & Field1$timestamp3 <= "2019-03-25 23:45:00",]

Field1_test <- Field1[Field1$timestamp3 > "2019-03-25 23:45:00" & Field1$timestamp3 <= "2019-03-29 23:50:00",]

#Field2

Field2 <- Field2[with(Field2, order(timestamp3)), ]

Field2_train <- Field2[Field2$timestamp3 >= "2019-02-23 00:00:00" & Field2$timestamp3 <= "2019-05-25 08:40:00",]

Field2_test <- Field2[Field2$timestamp3 > "2019-05-25 08:40:00" & Field2$timestamp3 <= "2019-05-31 10:15:00",]

#Field3

Field3 <- Field3[with(Field3, order(timestamp3)), ]

Field3_train <- Field3[Field3$timestamp3 >= "2019-02-23 00:00:00" & Field3$timestamp3 <= "2019-04-19 21:10:00",]

Field3_test <- Field3[Field3$timestamp3 > "2019-04-19 21:10:00" & Field3$timestamp3 <= "2019-04-23 21:15:00",]

#Field4

Field4 <- Field4[with(Field4, order(timestamp3)), ]

Field4_train <- Field4[Field4$timestamp3 >= "2019-02-23 00:00:00" & Field4$timestamp3 <= "2019-05-25 08:40:00",]

Field4_test <- Field4[Field4$timestamp3 > "2019-05-25 08:40:00" & Field4$timestamp3 <= "2019-05-31 08:45:00",]

replying to DrFad
edited 1 minute later

I have noticed irregularities from my data splits ....I have rigid boundaries but some entries have values for soil humidity

Field1 Test timestamp boundaries START 2019-03-25 22:50:00 STOP 2019-03-29 22:50:00

e.g in Field1 Test

2019-03-26 07:35:00 has a value 41

2019-03-27 13:40:00 has a value 42

2019-03-29 19:50:00 has a value 46

The data ought to have been split when provided IMO....

The test entries with values are the peak soil humidities as described in the info section.

Thanks DrFad, by applying those bounderies I find that the lenghth of all test data is 5775, so there is 7 missing rows ? also how to prapare the sumbmission file for the 4 fields. Any code please ?

You are welcome. You will have to add them manually by creating new rows. Please look at the end of each field in the submission file and add manually

the times in train are wrong they gived me 23000 row in the second field, any help please ? what is the correct boundries for training in each field ? thanks