18 Feb 2021, 12:10

Getting started using Google Colab with Zindi

Struggling with computing power for data science? Giving up when competing because your system drags? We have good news for you it starts with C - Colab. Join Zindi winner Eniola Olaleye for a simple walkthrough of using Colab for Zindi projects.

Google Colab (short for Colaboratory) is an online platform that allows you to perform your data science experiments, by providing you with an interactive programming environment just like Jupyter Notebooks and access to free computing resources (GPUs and TPUs) allowing faster computation and enabling data scientists to work on machine learning models even if their local computer is not up to the task.

Watch the full version of this tutorial with Eniola Olaleye on our YouTube channel

Pre-installed libraries

Unlike Anaconda which helps you with basic libraries like Pandas, NumPy , SciKit Learn and Matplotlib, Colab provides more advanced libraries such as Keras, TensorFlow, and PyTorch.

In case the library you need is not in Colab, you can install your required library easily using this line of code:

!pip install library_name

In the example below, you can see how catboost can be installed:

Ways to upload your dataset to Colab

There are three different ways in which you can read your datasets in Colab:

1. Manual data upload

In the top left corner of Colab you can upload your data directly by clicking on the page icon with the arrow sign facing upward.

2. Get data from Google Drive by mounting your Drive in Colab

First, the datasets to be read must be in your Google Drive. Then, run this code in Colab:

#code to mount google drive

from google.colab import drive


After running the code, it will produce a link which directs you to chose which account the files to be read should be stored in, and gives you a code for authorisation.

After doing that, you should automatically see your Drive mounted to your Colab notebook.

Then you can read in your data from any directory in your Drive just like this:

import pandas as pd

test  = pd.read_csv("/content/drive/MyDrive/zindi with colab/Test.csv")

The drawback to this method is that you have to mount your Drive every time you use Colab.

3. Using a function to read your data directly from your Drive

By using a function to read directly from your Google Drive, you don’t have to go through the process of getting a code for authorisation every time you restart Colab; it automatically reads in your datasets for you.

Before showing you how to build the function, we need to copy the shareable link of our datasets - this is the link we will use to read the data from our Google Drive.

Here is the code to read directly from your Google Drive without mounting:

# Importing libraries

import requests

import pandas as pd

from io import StringIO

import warnings


# Google drive links to shared  datasets

train = 'https://drive.google.com/file/d/1PfZGNRlrLOX_JaXAjuzqt1laiH6vRFva/view?usp=sharing'

# Creating a function to read a csv file shared via google

def read_csv(url):

url = 'https://drive.google.com/uc?export=download&id=' + url.split('/')[-2]

csv_raw = requests.get(url).text

csv = StringIO(csv_raw)

df = pd.read_csv(csv)

return df

#reading the data

train = read_csv(train)

Downloading outputs

Once you have your results, you can download them very easily with Colab, in any format you need. Once you have your results, a simple right click on the result gives you the option to download.

Using TPUs and GPUs

GPUs and TPUs can be easily selected in the Runtime menu. You can chose any one that suits your project - as a rough guide, TPUs perform better for machine learning jobs, while GPUs are better for everything else.

I hope that this tutorial and video guide has provided you with the confidence to use Google Colab in your next Zindi project. Now you’re ready to tackle any challenge with the power of Colab!

Eniola is a machine learning engineer who loves building machine learning models for educational as well as social purposes,he participates actively in Data Science Competitions (Kaggle, Zindi etc.) to solve real-world problems and expand his skill sets.

About the author

Eniola Olaleye is a machine learning engineer and data scientist at Benshi.ai who loves building machine learning models for educational as well as social purposes. He participates actively in data science competitions to solve real-world problems and expand his skill set, and is currently ranked 6th on the Zindi leaderboards. You can reach him on LinkedIn or GitHub.