I can't get test data from the train, and by applying set on timestamp in submission, I find many repititeve timestamp, is there any code that can help me ?
The test set is at the tail end of every field. Carefully observing the details in the info section and train data will put you in the right direction.
However, see starter code below in R. Hope it helps
#Convert timestamp to date and time in train, i.e. col timestamp3 below
Thanks DrFad, by applying those bounderies I find that the lenghth of all test data is 5775, so there is 7 missing rows ? also how to prapare the sumbmission file for the 4 fields. Any code please ?
the times in train are wrong they gived me 23000 row in the second field, any help please ? what is the correct boundries for training in each field ? thanks
Hello Hamdi,
The test set is at the tail end of every field. Carefully observing the details in the info section and train data will put you in the right direction.
However, see starter code below in R. Hope it helps
#Convert timestamp to date and time in train, i.e. col timestamp3 below
Wazi <- cbind(Wazi,timestamp3 = strptime(Wazi$timestamp , format = "%Y-%m-%d %H:%M:%S", tz = "GMT"))
#You can seperate field1 to 4 into different dataframes, i.e. 4, by selecting the required columns
#Once you complete the step in the column below apply the code below to get train and test for field 1 to 4
#Get train and test for each field
#Field1
Field1 <- Field1[with(Field1, order(timestamp3)), ]
Field1_train <- Field1[Field1$timestamp3 >= "2019-02-23 00:00:00" & Field1$timestamp3 <= "2019-03-25 23:45:00",]
Field1_test <- Field1[Field1$timestamp3 > "2019-03-25 23:45:00" & Field1$timestamp3 <= "2019-03-29 23:50:00",]
#Field2
Field2 <- Field2[with(Field2, order(timestamp3)), ]
Field2_train <- Field2[Field2$timestamp3 >= "2019-02-23 00:00:00" & Field2$timestamp3 <= "2019-05-25 08:40:00",]
Field2_test <- Field2[Field2$timestamp3 > "2019-05-25 08:40:00" & Field2$timestamp3 <= "2019-05-31 10:15:00",]
#Field3
Field3 <- Field3[with(Field3, order(timestamp3)), ]
Field3_train <- Field3[Field3$timestamp3 >= "2019-02-23 00:00:00" & Field3$timestamp3 <= "2019-04-19 21:10:00",]
Field3_test <- Field3[Field3$timestamp3 > "2019-04-19 21:10:00" & Field3$timestamp3 <= "2019-04-23 21:15:00",]
#Field4
Field4 <- Field4[with(Field4, order(timestamp3)), ]
Field4_train <- Field4[Field4$timestamp3 >= "2019-02-23 00:00:00" & Field4$timestamp3 <= "2019-05-25 08:40:00",]
Field4_test <- Field4[Field4$timestamp3 > "2019-05-25 08:40:00" & Field4$timestamp3 <= "2019-05-31 08:45:00",]
I have noticed irregularities from my data splits ....I have rigid boundaries but some entries have values for soil humidity
Field1 Test timestamp boundaries START 2019-03-25 22:50:00 STOP 2019-03-29 22:50:00
e.g in Field1 Test
2019-03-26 07:35:00 has a value 41
2019-03-27 13:40:00 has a value 42
2019-03-29 19:50:00 has a value 46
The data ought to have been split when provided IMO....
The test entries with values are the peak soil humidities as described in the info section.
Thanks DrFad, by applying those bounderies I find that the lenghth of all test data is 5775, so there is 7 missing rows ? also how to prapare the sumbmission file for the 4 fields. Any code please ?
You are welcome. You will have to add them manually by creating new rows. Please look at the end of each field in the submission file and add manually
the times in train are wrong they gived me 23000 row in the second field, any help please ? what is the correct boundries for training in each field ? thanks