Hello, please, is there a way I can load datasets directly from Zindi into google colab without downloading them on my computer? There is a competition I would like to work on but the dataset is really large for me to download. Thank you
First of all, copy and paste the above codes into your colab notebook, then comply with the following steps below:
Step 1: Make sure that all provided datasets are correct in terms of spelling inside your "data_list" above and also include all other dataset name that's not included inside the "data_list".
Step 2: Go back to thezindi page of such competition. Right click on any of these competition datasets provided and click on "inpect", then it will take you to where you will see something like this below, let's say you right click on dataset called Train.csv and click on inspect, then you will see semething like this below:
Step 3: Which is the final step, you will copy what's going to be your "auth token" at exactly where you inspect and copy your "base_path" in step 2 above.
%%time
import pandas as pd
import numpy as np
import requests
myobj = {'auth_token': 'xxxxxxxxxxxx'} # copy and paste your auth_token here
data_list=['Train.csv','Test.csv','SampleSubmission.csv','VariableDescription.csv']
target_dir=''
base_path='https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/'
def load_zindi_data(data_list,base_path,target_dir):
for data in data_list:
target_path= target_dir +data
data_path=base_path+ data
x = requests.post(data_path, data = myobj,stream=True)
handle = open(target_path, "wb")
for chunk in x.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
handle.write(chunk)
handle.close()
load_zindi_data(data_list,base_path,target_dir)
#############################################################
First of all, copy and paste the above codes into your colab notebook, then comply with the following steps below:
Step 1: Make sure that all provided datasets are correct in terms of spelling inside your "data_list" above and also include all other dataset name that's not included inside the "data_list".
Step 2: Go back to the zindi page of such competition. Right click on any of these competition datasets provided and click on "inpect", then it will take you to where you will see something like this below, let's say you right click on dataset called Train.csv and click on inspect, then you will see semething like this below:
'https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/Train.csv'
This is where you will copy what's going to be your datasets "base_path" for the above codes. i.e your base_path will be 'https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/' only, without including Train.csv.
Step 3: Which is the final step, you will copy what's going to be your "auth token" at exactly where you inspect and copy your "base_path" in step 2 above.
Cheers !!!
Thank you. I'll try this out