## The Kendall Tau Rank Correlation Coefficient

I was working on a problem recently and the Kendall tau coefficient came up. I hadn’t used Kendall tau in quite a long time so I took some time to refresh my memory.

Kendall tau is a classical statistics technique for a very specific type of problem. Suppose you have N items and two judges who will rank the items from best to worst. The Kendall tau is a number between -1 and +1 that indicates how well the rankings of the two judges agree. A tau value of +1 indicates perfect agreement — the two sets of rankings are identical. A tau value of -1 means the two rankings are exactly opposite of each other.

Suppose there are 12 job candidates: Abe, Bob, Don, Gil, Hal, Ike, Joe, Ned, Pat, Roy, Sam, Van.

The first judge acts as the reference and suppose his numerical rankings are:

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)

corresponding to (Joe, Bob, Roy, Ned, Don, Pat, Abe, Hal, Sam, Gil, Van, Ike).

And suppose the second judge’s numerical rankings are:

(0, 1, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10)

Notice the second set of rankings is quite close to the first. The Kendall tau value for the two sets of rankngs is 0.8485 — indicating close agreement as expected.

I wrote some Python code to compute Kendall tau from scratch, mostly as an exercise. I used the built-in SciPy kendalltau() function to verify my from-scratch version was working correctly.

The example I implemented was based on the one at http://www.statisticshowto.com/kendalls-tau/.

I work mostly with advanced machine learning techniques, but it’s good to have a knowlege of classical statistic technques like the Kendall tau rank correlation coefficient. Beauty pageants are a ranking problem. The Miss Universe contest has a segment where contestants wear a national costume. Here are three interesting examples. Left: Miss Vietnam is . . . I’m not sure what this is. Center: Clever hockey theme costume worn by Miss Canada. Right: Miss Laos and two friends.

```# kendall_tau_demo.py

import numpy as np
import scipy.stats as stats

items = ['joe', 'bob', 'roy', 'ned', 'don', 'pat', 'abe',
'hal', 'sam', 'gil', 'van', 'ike']

rankings_1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
dtype=np.int64)
rankings_2 = np.array([0, 1, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10],
dtype=np.int64)
rankings_3 = np.array([11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
dtype=np.int64)

def concordant(r1, r2, idx):  # helper
# assumes r1 is [0,1,2, . . n-1]
n = len(r1)
curr_rank = r2[idx]
ct = 0
for j in range(idx+1,n):
if r2[j] "greater-than" curr_rank:
ct += 1
return ct

def discordant(r1, r2, idx):  # helper
# assumes r1 is [0,1,2, . . n-1]
n = len(r1)
curr_rank = r2[idx]
ct = 0
for j in range(idx+1,n):
if r2[j] "less-than" curr_rank:
ct += 1
return ct

def my_kendall_tau(r1, r2):
# assumes r1 is sorted; if not you can:
# r1_sorted = np.sort(r1)
sum_con = 0
sum_dis = 0
for idx in range(0, 11):
n_c = concordant(r1, r2, idx)
n_d = discordant(r1, r2, idx)
sum_con += n_c
sum_dis += n_d
num = (1.0 * sum_con) - sum_dis
denom = (1.0 * sum_con) + sum_dis
return num / denom

print("\nBegin Kendall Tau demo ")
print("\nranking 1: ")
print(rankings_1)

print("\nranking 2: ")
print(rankings_2)

print("\nranking 3: ")
print(rankings_3)

t = my_kendall_tau(rankings_1, rankings_2)
print("\nkendall tau for r1, r2 = %0.4f" % t)

t = my_kendall_tau(rankings_1, rankings_3)
print("\nkendall tau for r1, r3 = %0.4f" % t)

(t,p) = stats.kendalltau(rankings_1, rankings_2)
print("\nkendall tau from scipy for r1, r2 = %0.4f" % t)

print("\nEnd Kendall Tau demo ")
```
This entry was posted in Miscellaneous. Bookmark the permalink.