User avatar
Toluhenok
Load Datasets without downloading
Data · 31 Mar 2023, 13:01 · 2

Hello, please, is there a way I can load datasets directly from Zindi into google colab without downloading them on my computer? There is a competition I would like to work on but the dataset is really large for me to download. Thank you

Discussion 2 answers
User avatar
MICADEE
LAHASCOM

%%time

import pandas as pd

import numpy as np

import requests

myobj = {'auth_token': 'xxxxxxxxxxxx'} # copy and paste your auth_token here

data_list=['Train.csv','Test.csv','SampleSubmission.csv','VariableDescription.csv']

target_dir=''

base_path='https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/'

def load_zindi_data(data_list,base_path,target_dir):

for data in data_list:

target_path=  target_dir +data

data_path=base_path+ data

x = requests.post(data_path, data = myobj,stream=True)

handle = open(target_path, "wb")

for chunk in x.iter_content(chunk_size=512):

if chunk:  # filter out keep-alive new chunks

handle.write(chunk)

handle.close()

load_zindi_data(data_list,base_path,target_dir)

#############################################################

First of all, copy and paste the above codes into your colab notebook, then comply with the following steps below:

Step 1: Make sure that all provided datasets are correct in terms of spelling inside your "data_list" above and also include all other dataset name that's not included inside the "data_list".

Step 2: Go back to the zindi page of such competition. Right click on any of these competition datasets provided and click on "inpect", then it will take you to where you will see something like this below, let's say you right click on dataset called Train.csv and click on inspect, then you will see semething like this below:

'https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/Train.csv'

This is where you will copy what's going to be your datasets "base_path" for the above codes. i.e your base_path will be 'https://api.zindi.africa/v1/competitions/datadrive2030-early-learning-predictors-challenge/files/' only, without including Train.csv.

Step 3: Which is the final step, you will copy what's going to be your "auth token" at exactly where you inspect and copy your "base_path" in step 2 above.

Cheers !!!

1 Apr 2023, 23:46
Upvotes 0
User avatar
Toluhenok

Thank you. I'll try this out

11 Apr 2023, 16:21
Upvotes 0