Role of confusion matrix in cybercrime cases

Ishika Mandloi
3 min readJun 6, 2021

--

Due to more and more exposure to the internet, the rate of cybercrime is increasing day by day in many different ways. To analyze the cybercrime rate based on the type of cybercrime in an area we can use machine learning, and then further action on the predicted crime type can be taken by the cybercrime investigators in that particular area.

These are some of the types of cybercrime:

  1. Child pornography
  2. Cyberattack
  3. Identity theft
  4. other
  5. Phishing
  6. Platform fraud
  7. Online threat

For example, we have data on all these types of cybercrime, and using machine learning we are classifying types of cybercrime, we train a model for classification after that we need to find the accuracy of our model to predict the type of cybercrime.

to calculate the model's accuracy, to measure how well the model predicts/classifies on the dataset confusion matrix is used. confusion matrix gives the statistics about the classification is correct and positive, the category of positive and negative depends on the problem statements also confusion matrix give the statistics about no of classified correctly and miss classified

considering binary confusion matrix if two features are to be classified.

Using this matrix we can find the sensitivity, precision, and specificity of the model. For multi-features classification, we using a multidimensional confusion matrix.

If the predicted class is positive and the actual class is positive too then it is a true positive.

If the predicted class is positive and the actual class is negative then it is a false negative. (Type 2)

If the predicted class is negative and the actual class is positive then it is a false positive. (Type 1)

If the predicted class is negative and the actual class is negative too then it is a true negative.

Mostly multidimensional confusion matrix is used for classification

In this case confusion matrix is like:

Here normalize confusion shows that the model predicts identity theft as platform fraud.

Based on this confusion matrix all these matrices are calculated

Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN

Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN

Precision (true positives / predicted positives) = TP / TP + FP

Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN

Specificity (true negatives / all actual negatives) =TN / TN + FP

In this way, a confusion matrix is used for the classification of classes.

Thank you, I hope this blog will be informative for all of you.

--

--