Struggling with computing power for data science? Giving up when competing because your system drags? We have good news for you it starts with C - Colab. Join Zindi winner Eniola Olaleye for a simple walkthrough of using Colab for Zindi projects.
Google Colab (short for Colaboratory) is an online platform that allows you to perform your data science experiments, by providing you with an interactive programming environment just like Jupyter Notebooks and access to free computing resources (GPUs and TPUs) allowing faster computation and enabling data scientists to work on machine learning models even if their local computer is not up to the task.
In case the library you need is not in Colab, you can install your required library easily using this line of code:
!pip install library_name
In the example below, you can see how catboost can be installed:
There are three different ways in which you can read your datasets in Colab:
In the top left corner of Colab you can upload your data directly by clicking on the page icon with the arrow sign facing upward.
First, the datasets to be read must be in your Google Drive. Then, run this code in Colab:
#code to mount google drive
from google.colab import drive
After running the code, it will produce a link which directs you to chose which account the files to be read should be stored in, and gives you a code for authorisation.
After doing that, you should automatically see your Drive mounted to your Colab notebook.
Then you can read in your data from any directory in your Drive just like this:
import pandas as pd
test = pd.read_csv("/content/drive/MyDrive/zindi with colab/Test.csv")
The drawback to this method is that you have to mount your Drive every time you use Colab.
By using a function to read directly from your Google Drive, you don’t have to go through the process of getting a code for authorisation every time you restart Colab; it automatically reads in your datasets for you.
Before showing you how to build the function, we need to copy the shareable link of our datasets - this is the link we will use to read the data from our Google Drive.
Here is the code to read directly from your Google Drive without mounting:
# Importing libraries
import pandas as pd
from io import StringIO
# Google drive links to shared datasets
train = 'https://drive.google.com/file/d/1PfZGNRlrLOX_JaXAjuzqt1laiH6vRFva/view?usp=sharing'
# Creating a function to read a csv file shared via google
url = 'https://drive.google.com/uc?export=download&id=' + url.split('/')[-2]
csv_raw = requests.get(url).text
csv = StringIO(csv_raw)
df = pd.read_csv(csv)
#reading the data
train = read_csv(train)
Once you have your results, you can download them very easily with Colab, in any format you need. Once you have your results, a simple right click on the result gives you the option to download.
GPUs and TPUs can be easily selected in the Runtime menu. You can chose any one that suits your project - as a rough guide, TPUs perform better for machine learning jobs, while GPUs are better for everything else.
I hope that this tutorial and video guide has provided you with the confidence to use Google Colab in your next Zindi project. Now you’re ready to tackle any challenge with the power of Colab!
Eniola is a machine learning engineer who loves building machine learning models for educational as well as social purposes,he participates actively in Data Science Competitions (Kaggle, Zindi etc.) to solve real-world problems and expand his skill sets.
Eniola Olaleye is a machine learning engineer and data scientist at Benshi.ai who loves building machine learning models for educational as well as social purposes. He participates actively in data science competitions to solve real-world problems and expand his skill set, and is currently ranked 6th on the Zindi leaderboards. You can reach him on LinkedIn or GitHub.