Classification models are increasingly being used in decision making. For example, these models help determine whether a tumor is benign or malignant, whether stock option prices will rise or fall, whether a bank client will default on loan payments or not; and they are used to identify objects in natural images, and divide customers into market segments. In life threatening situations it is imperative that a model or classifier makes correct predictions. In machine learning competition s, a better classifier in terms of accuracy pushes a competitor’s model closer to the top of the leaderboard.
A multi-class classification model has multiple classes or labels and it assigns exactly one label to each sample instance. Some instances may be misclassified, which affects the accuracy of the model. To rectify this problem, classification models use the logarithmic loss (or simply log loss) measure to penalise misclassifications. A high log loss score is indicative of a poor classifier. A good classifier will try to minimise the log loss score.
Log loss is a probabilistic measure of accuracy. This means, for instance, a probabilistic classifier like logistic regression will output a probability for each class rather than assign the most likely label to the class. Another way to interpret this probabilistic outcome is to see the probability as the level of confidence the classifier has in its predictions. Supppose a classifier predicts the true label of a single instance with probability 0.1 (not a very confident prediction), the model is penalised by having a higher log loss score.
Log loss is defined  as the negative log-likelihood of the true labels given the predicted probabilities. The log loss score lies between 0 and infinity. A log loss score close to 0 has a high probability that the classifier has made the right decision, that its more accurate. A higher log loss score indicates with low probability that the classifer is confident it has made the right decision.
Accuracy measures that count the number of correct classifications are affected by uneven distributions of the samples. In this famous example, a biomedical study trained a classifier on a sample size of one million patients with only 1% of the patients with disease . The classifier was 99% accurate. But given a different dataset, its predictive power drops. The log loss measure for such a model will certainly be very high.
Log loss is also closely related to the idea of cross entropy. Roughly stated, cross entropy quantifies the difference between a true distribution and a predicted distribution. Log loss quantifies how close the predicted labels are to the true labels. Used as an evaluation metric, the log loss score is quite easy to understand and motivates competitors to improve their models.
Author: Martha Kamkuemah
Martha is a programmer by training with a keen interest in using open source software for problem solving. She uses programming to fill the gap between theoretical models in data science and real world applications, including visualizing data and applying machine learning techniques. Her research interests include network security analysis using formal mathematical techniques. Martha also enjoys hiking and photography.