20 May 2019, 07:28

Introduction to Anomaly Detection using Machine Learning with a Case Study

A common need when you are analyzing real-world data-sets is determining which data point stand out as being different to all others data points. Such data points are known as anomalies.

This article was originally published on Medium by Davis David.

Twitter: @Davis_McDavid


In this article, you will learn a couple of Machine Learning-Based Approaches for Anomaly Detection and then show how to apply one of these approaches to solve a specific use case for anomaly detection (Credit Fraud detection) in part two.

A common need when you analyzing real-world data-sets is determining which data point stand out as being different to all others data points. Such data points are known as anomalies, and the goal of anomaly detection (also known as outlier detection) is to determine all such data points in a data-driven fashion. Anomalies can be caused by errors in the data but sometimes are indicative of a new, previously unknown, underlying process.

What is Anomaly Detection?

Anomaly detection is a technique used to identify unusual patterns that do not conform to expected behavior. It can be considered the thoughtful process of determining what is normal and what is not. Anomalies are also referred to as outliers, novelties, noise, exceptions and deviations.

Let’s take an example from students in the classrooms to understand what Anomalies or outliers real means. Students always perform differently; some students either outperform other students or failed to even pass with a minimum when it comes to securing marks in subjects. An example most of the students scored between 40% up to 80%, we can say these marks are generally normally distributed. But if a student gets a score which is extremely high(99% ) or extremely low ( 5%) in statistics and others, related areas like machine learning these values are referred to as Anomalies or Outliers.

Simply, anomaly detection is the task of defining a boundary around normal data points so that they can be distinguishable from outliers.

How do we identify whether data points are normal or anomalous? In some simple cases, as in the example figure below, data visualization can give us valuable information.

In the above figure, I show you what it’s like to be outliers within a set of closely related data-points. From the figure above you can identify what is normal data point and what is not normal data point. The data point with a red color is the anomaly or outlier because it deviates hugely from the normal data points(data point with blue color).

Applications of Anomaly Detection in the Business world.

Anomaly detection is applicable in a variety of domains such as

  • Intrusion detection, example identifies strange patterns in the network traffic (that could signal a hack).
  • Health monitoring system in the hospital.
  • Fraud detection in credit card transactions in Banks.
  • Fault detection in operating environments.
  • Detection of fake news and misinformation in the Internet.
  • Industry damage detection.
  • Security and surveillance.

“Outliers are not necessarily a bad thing. These are just observations that are not following the same pattern as the other ones. But it can be the case that an outlier is very interesting. For example, if in a biological experiment, a rat is not dead whereas all others are, then it would be very interesting to understand why. This could lead to new scientific discoveries. So, it is important to detect outliers.”

– Pierre Lafaye de Micheaux, Author and Statistician

Anomalies can be broadly categorized as:

a) Point Anomalies:

A single instance of data is anomalous if it deviates largely from the rest of the data points. An example is Detecting credit card fraud based on “amount spent.”

b) Contextual Anomalies:

The abnormality is context specific because to identify if is the anomaly it depends on contextual information. This type of anomaly is common in time-series data. Example people spend a lot amount of money during the holiday, but otherwise, it can be different.

c) Collective Anomalies:

If a collection of related data instances is anomalous with respect to the entire dataset, but not individual values. Example Someone is trying to copy data from a remote machine to a local host unexpectedly( a potential cyber attack).

Machine Learning-Based Approaches for Anomaly Detection:

Lets learning different approaches we can use in machine learning for anomaly detection.

(a) Clustering-Based Anomaly Detection

The approach focus on unsupervised learning, similar data points tend to belong to similar groups or clusters, as determined by their distance from local centroids.

The k-means algorithm can be used which partition the dataset into a given number of clusters. Any data points that fall outside of these clusters are considered as anomalies.

(b) Density-based anomaly detection

This approach is based on the K-nearest neighbors algorithm. It’s evident that normal data points always occur around a dense neighborhood and abnormalities deviate far away. To measure the nearest set of a data point, you can use Euclidian distance or similar measure according to the type of data you have.

(c) Support Vector Machine-Based Anomaly Detection

A support vector machine is another effective technique for detecting anomalies. One-Class SVMs have been devised for cases in which one class only is known, and the problem is to identify anything outside this class.

This is known as novelty detection, and it refers to automatic identification of unforeseen or abnormal phenomena, i.e. outliers, embedded in a large amount of normal data.

(d) Supervised Deep Anomaly Detection

In deep learning Supervised deep anomaly detection involves training a deep supervised binary or multi-class classifier, using labels of both normal and anomalous data instances.

For instance supervised Deep anomaly detection models, formulated as multi-class classifier aids in detecting rare brands, prohibited drug name mention and fraudulent health-care transactions.

(e) Semi-supervised deep anomaly detection

This is more popular method than supervised DAD, the labels of normal instances are far more easy to obtain than anomalies, as a result, semi-supervised DAD techniques are more widely adopted, these techniques leverage existing labels of single (normally positive class) to separate outliers.

(f) One-Class Neural Networks (OC-NN)

The One class neural network (OC-NN) methods are inspired by kernel-based one-class classification which combines the ability of deep networks to extract a progressively rich representation of data with the one-class objective of creating a tight envelope around normal data.

In the future, I will share the second part of this article on how to implement one of the machine learning based approaches for anomaly detection.What impact do you think Anomaly detection will have on the various industries? I would love to hear your thoughts in the comments below.

Useful links and other ML tutorials: