The past two years have been heavy for most of us due to the Covid-19 pandemic. With the introduction of vaccines, proper sanitisation and raising awareness about the infectious virus, casualties are decreasing and the world is returning to something resembling normality. But still, we have to keep up good practices since the virus keeps on mutating.
Wearing masks is an essential way to stop the spread of Covid-19. Most countries around the world have laws that people should wear masks, some or all of the time. In public, you're likely to see many people wearing masks, others will not wear them at all, and still others will wear them incorrectly. Hence the need for human effort to ensure that mask mandates and policies are obeyed.
In this article, we will go through a Knowledge competition hosted on Zindi where we are asked to build a machine learning model that classifies between people wearing a mask or not - best known as an Image Classifier. We will also we do a quick dive into computer vision for classification.
As defined in Grokking Deep learning for Computer Vision:
📌 Image classifier is an algorithm that takes in an image as input and outputs a label or “class” that identifies that image.
In this article, we will mostly use Python and the Keras framework.
The challenge is hosted on Zindi, and you can find the complete description and join the challenge here.
This is a classification problem where we are dealing with image data, which is an application of Computer Vision.
According to Computer Vision with Python By Oreilly:
📌 Computer vision is the automated extraction of information from images. Information can mean anything from 3D models, camera position, object detection and recognition to grouping and searching image content.
IBM also gives a good definition to the related subject:
📌 Computer vision is an artificial intelligence field that enables systems and computers to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information.
Images are broken down into pixels which are the smallest unit of information that make up the picture. You have to understand the bigger picture of how to extract information from pixels and interpret what they represent.
Computer vision technology thrives in real-world applications such as facial recognition, augmented reality, self-driving cars, visual search, optical character recognition etc. Computer vision technology has been growing rapidly due to advancements in AI and deep learning - this has resulted to smarter applications in the industry.
Python supports tasks in computer vision well, by providing us with excellent libraries for building these models. Some of the libraries that should cross your mind when working on computer vision tasks are:
There are two approaches that computer vision systems cna take to analyze and model image data.
Machine Learning Flow
The first step is always to get the relevant data - in this case we specifically need image data.
Good examples of this approach are support vector machine and random forest algorithms.
Deep Learning Flow
Deep learning algorithms have simplified the traditional machine learning flow. Deep learning is a special class of algorithms based on artificial neural networks. All deep learning algorithms are machine learning algorithms but not all machine learning algorithms are deep learning algorithms.
Deep learning automatically extracts features (feature engineering) while learning the importance of the features by applying weight to its outputs (feature importance).
As much as ML algorithms perform remarkably and give satisfactory results, when it comes to tasks like computer vision, deep learning algorithms are often preffered due to their extraordinary performance.
A good example of this application is in Convolutional Neural Networks (CNNs).
What are Convolutional Neural Networks?
CNNs (also known as convnets) are a type of deep learning model used in computer vision applications. Before we get into the challenge, it is important for us to understand how CNNs process input data to give us the final model. CNNs shine in processing and classifying images.
Layers in a CNN include:
Popular activation functions used are ReLU (Rectified Linear Units) and Softmax.
📌 An activation function is a function used in artificial neural networks which outputs a small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs are large enough, the activation function "fires", otherwise it does nothing. In other words, an activation function is like a gate that checks that an incoming value is greater than a critical number. (Definition from deepai.org)
Softmax function maps outputs to a [0,1] range and ensures that for each output the total sum is 1.
ReLU is the default activation in CNN. It mimics neuron activations to introduce nonlinearity to values x>0 and returns 0 if values does not meet condition. ReLU helps models to learn faster and perform better.
Now that we have reviewed some of the theory needed for this challenge, let’s get into the challenge itself. Zindi has provided us with different files to help us achieve our goal:
First we need to import libraries:
# Importing the relevant libraries
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import random
from IPython.display import Image as ShowImage
# Keras libraries
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Activation, BatchNormalization,GlobalMaxPooling2D
from tensorflow.keras.applications.vgg16 import VGG16
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
from keras.models import Model
Keras is a deep learning framework built for Python which provides methods to train deep learning models. Keras has built-in support for CNNs (for computer vision) and has a user-friendly API that makes it easy to quickly prototype deep learning models.
Importing the sequential method, means that all the layers in the model will be arranged in sequence. The importance of ImageDataGenerator is to generate batches of tensor image data with real-time data augmentation. It has very many useful functions such as rescaling, rotating, zooming etc.
The keras.layers import gives us access to layers that we need to build our CNN which are described above. Layers are the building blocks of neural networks.
Optimizers are necessary for improving your model speed and performance. Optimizers shape the model into its most accurate form by playing with model weights.
VGG16 is a convolutional neural network architecture which provides 16 layers. Its a pretrained model. VGG16 is a model employs the transfer learning architecture. Transfer learning is the art of reusing a model on one task and repurposing it on another task.
Pandas library provides built-in methods for data manipulation.
Matplotlib is used for creating graphs where necessary when building our model.
The IPython.display imports a method for viewing images within the notebook.
Now, let’s have a glimpse of our data! 😊
Train Data
The train data has two columns the image and the train data and has 1308 different records!
# Reading the data
train_labels = pd.read_csv("train_labels.csv")
# Show the first 5 rows
train_labels.head()
Let’s create a simple plot bar to understand how the train data:
train_labels['target'].value_counts().plot.bar()
Image Data
This data comes in a zip file. For this we can use the following command to unzip all the files:
# Extracts all the files
!unzip -q images.zip
We can view images using the ShowImage() function:
# We can view the images with the ShowImage functionality
ShowImage("/content/images/aadawlxbmapqrblgxyzarhjasgiobu.png")
If you found this fun, you can also use the OpenCV Library to view images!
import cv2 import random import os
# This stores the location of the data source
data = os.listdir("images")
# Picking random sample from data list
sample = random.choice(data)
# The imread method loads image from the sprcified file
img = cv2.imread("images/"+sample)
# The cmap parameter displays the image in gray
plt.imshow(img, cmap="gray")
Sample_submission.csv
This is the file that is used for making submissions after you have created your model.
We should replace the target variables (0,1) into categories (mask, unmask) using the replace method.
train_labels["target"] = train_labels["target"].replace({0: 'unmask', 1: 'mask'})First, we need to declare how the image data will be passed to the input layer.
# Defining how data is passed to the input layer
image_size = 224
input_shape = (image_size, image_size, 3)
batch_size = 16
As mentioned earlier, convnets take input tensors of the shape - image height, image width and image channels. The images’ input shape from the code above is (224,224,3).
pre_trained_model = VGG16(input_shape=input_shape, include_top=False, weights="imagenet")
for layer in pre_trained_model.layers[:15]:
layer.trainable = False
for layer in pre_trained_model.layers[15:]:
layer.trainable = True
last_layer = pre_trained_model.get_layer('block5_pool')
last_output = last_layer.output
x = GlobalMaxPooling2D()(last_output)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(2, activation='softmax')(x)
model = Model(pre_trained_model.input, x)
model.compile(loss='binary_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
model.summary()
Here we are passing our parameters on to our VGG16.
The include_top parameter states whether to include the output layers or not. In case you are fitting your model into your own problem, you don't need it.
The weights parameter specifies what weights to load.
GlobalMaxPooling2D - this is another pooling type where the pooling size is set to equal the input size, so that the max of the entire input is connected as the output value.
Dense implements the operation using the ReLU and Softmax algorithm respectively.
The model compile method defines the model by specifying the optimiser, loss and metrics.
The loss parameter is specified to type 'binary_crossentropy'. This measures how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
An optimiser is the mechanism through which the network will update itself based on the data it sees and its loss function. The metrics parameter is set to 'accuracy' - here we only care about how the model will perform.
The optimiser is set to Stochastic Gradient Descent Optimiser. This refers to the mechanism through which the network will update itself based on the data it sees and its loss function.
The model summary method is used to see all parameters and shapes in each layers in our models, which will give you the following result:
The total parameters are 14,978,370
The trainable parameters are 7,343,106
The non-trainable parameters are 7,635,264
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
earlystop = EarlyStopping(patience=10)
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',patience=2,verbose=1,factor=0.5,min_lr=0.00001)
callbacks = [earlystop, learning_rate_reduction]
Earlystop is a technique that is used to reduce overfitting without comprising on model accuracy. Too many epochs can lead to overfitting, hence the need to apply the technique.
An epoch means training the network with all the training data for one cycle. A forward pass and a backward pass are counted as one pass.
ReduceLROnPlateau is a callback to reduce the learning rate when a metric has stopped improving.
Patience is the number of epochs with no improvement after which the learning rate is reduced.
from sklearn.model_selection import train_test_split
train_df,validate_df=train_test_split(train_labels,test_size=0.2,random_state=42)
train_df = train_df.reset_index(drop='True')
validate_df = validate_df.reset_index(drop='True')
Here, we are importing the train_test_split method from sklearn that basically splits the training data into two separate dataframes - the training and the testing data. random_state parameter sets the seed to the random generator so that your train-test splits are always deterministic.
Next, we can view the final format of the split data:
Train_df
validate_df
Next, we need to categorically encode categorical variables:
# Generate batches of tensor image data with real-time data augmentation
from keras.preprocessing.image import ImageDataGenerator, load_img
# Categorical encodes categorical variables
from tensorflow.keras.utils import to_categorical
Here we are rescaling the images, applying shear in some ranges, zooming the image and flipping the image horizontally:
# Here we are formatting the training data
train_datagen = ImageDataGenerator(rotation_range=15,
rescale=1./255,
shear_range=0.1,
zoom_range=0.2, # zoom range (1-0.2 to 1+0.2)
horizontal_flip=True,
width_shift_range=0.1,
height_shift_range=0.1)
train_generator = train_datagen.flow_from_dataframe(dataframe=train_df,
directory="images/",
x_col="image",
y_col="target",
target_size=(image_size,image_size),
class_mode='categorical',
batch_size=15)
# Here we are formatting images on the validation data
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(validate_df,
directory="images/",
x_col="image",
y_col="target",
target_size=(image_size,image_size),
class_mode='categorical',
batch_size=15)
Next we need to fit the model into the training data and we are running it for 100 epochs.
epochs=100 total_validate = validate_df.shape[0]
total_train = train_df.shape[0]
history = model.fit_generator(
train_generator,
epochs=epochs,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
callbacks=callbacks)
# Here we are creating a list of pictures - we are appending images on the list.
# Our data source is the original data before splitting to test and train data
target=[]
for i in data:
flag=0
for j in df["image"]:
if(i==j):
flag=1
break;
else:
continue
if(flag==0):
target.append(i)
Then we are creating a test data set using the image data:
#creating a test dataframe with images and the target is umask for all images
test = pd.DataFrame({
'image': target,
'target':"unmask"
})
test.head()
Lastly, we need to pass our test data to the image data generator - to fit the model into the test data:
test_gen = ImageDataGenerator(rescale=1./255)
test_generator = test_gen.flow_from_dataframe(
test,
directory="images/",
x_col="image",
y_col="target",
target_size=(image_size,image_size),
class_mode='categorical',
batch_size=15,
shuffle=False)
nb_samples = test.shape[0]
predict = model.predict_generator(test_generator, steps=np.ceil(nb_samples/batch_size))
The model.predict_generator helps make predictions on new image data.
The np.ceil method is used to find the ceil of the elements of the array
# Here we are converting the submission data to a dataframe
test["target"]=predict
#here we are converting to a csv file
test.to_csv("submission.csv",index=False)
Lastly we save our data on the submission file. When submitting the final result, always ensure that its in csv format.
Congratulations! You have understood basic concepts of CNNs and built a model with keras implementation. With the availability of awesome libraries, there are many ways to approach this problem -Frameworks which can solve the same problem are:
Traditional machine learning problems that can also help with the same problem are Support Vector Machines and Random Forests.
Joy Wawira is a software developer, technical writer, and data scientist. Follow her on Twitter @jlcodes.