## Making scikit Confusion Matrices Easier to Understand

I was looking at logistic regression with the scikit-learn (scikit or sklearn for short) library. There is a built-in scikit confusion_matrix(y_actuals, y_predicteds) function to compute and display a confusion matrix. But the output of printing the result of confusion_matrix() isn’t very easy to understand.

I ran a demo with this code:

```from sklearn.metrics import confusion_matrix
# get test_y actual data
y_predicteds = model.predict(test_x)
cm = confusion_matrix(test_y, y_predicteds)
print("Confusion matrix raw: ")
print(cm)
```

The output was:

```Confusion matrix raw:
[[17  9]
[ 2 12]]
```

It’s not clear which counts are which. As it turns out, the scikit documentation says, “Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and predicted label being j-th class.” In other words the entries are:

```actual 0  |   17    9
actual 1  |    2   12
----------
predicted      0    1
```

I coded up a show_confusion(cm) function to display a confusion matrix cm with some rudimentary labels. The code is:

```def show_confusion(cm):
ct_act0_pred0 = cm[0][0]  # TN
ct_act0_pred1 = cm[0][1]  # FP wrongly predicted as pos
ct_act1_pred0 = cm[1][0]  # FN wrongly predicted as neg
ct_act1_pred1 = cm[1][1]  # TP

print("actual 0  | %4d %4d" % (ct_act0_pred0, ct_act0_pred1))
print("actual 1  | %4d %4d" % (ct_act1_pred0, ct_act1_pred1))
print("           ----------")
print("predicted      0    1")
```

This function is hard-coded for binary classification. Here’s a general version that works for both binary classification and multi-class classification (three or more label values):

```def show_confusion(cm):
dim = len(cm)
mx = np.max(cm)             # largest count in cm
wid = len(str(mx)) + 1      # width to print
fmt = "%" + str(wid) + "d"  # like "%3d"
for i in range(dim):
print("actual   ", end="")
print("%3d:" % i, end="")
for j in range(dim):
print(fmt % cm[i][j], end="")
print("")
print("------------")
print("predicted    ", end="")
for j in range(dim):
print(fmt % j, end="")
print("")
```

Making a scikit confusion matrix less confusing — good fun.

See jamesmccaffrey.wordpress.com/2023/01/10/revisiting-binary-classification-using-scikit-logistic-regression/ for the complete code.

Confused dogs matrix.

This entry was posted in Scikit. Bookmark the permalink.