I was working on a problem recently and the Kendall tau coefficient came up. I hadn’t used Kendall tau in quite a long time so I took some time to refresh my memory.
Kendall tau is a classical statistics technique for a very specific type of problem. Suppose you have N items and two judges who will rank the items from best to worst. The Kendall tau is a number between -1 and +1 that indicates how well the rankings of the two judges agree.
A tau value of +1 indicates perfect agreement — the two sets of rankings are identical. A tau value of -1 means the two rankings are exactly opposite of each other.
Suppose there are 12 job candidates: Abe, Bob, Don, Gil, Hal, Ike, Joe, Ned, Pat, Roy, Sam, Van.
The first judge acts as the reference and suppose his numerical rankings are:
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
corresponding to (Joe, Bob, Roy, Ned, Don, Pat, Abe, Hal, Sam, Gil, Van, Ike).
And suppose the second judge’s numerical rankings are:
(0, 1, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10)
Notice the second set of rankings is quite close to the first. The Kendall tau value for the two sets of rankngs is 0.8485 — indicating close agreement as expected.
I wrote some Python code to compute Kendall tau from scratch, mostly as an exercise. I used the built-in SciPy kendalltau() function to verify my from-scratch version was working correctly.
The example I implemented was based on the one at http://www.statisticshowto.com/kendalls-tau/.
I work mostly with advanced machine learning techniques, but it’s good to have a knowlege of classical statistic technques like the Kendall tau rank correlation coefficient.
Beauty pageants are a ranking problem. The Miss Universe contest has a segment where contestants wear a national costume. Here are three interesting examples. Left: Miss Vietnam is . . . I’m not sure what this is. Center: Clever hockey theme costume worn by Miss Canada. Right: Miss Laos and two friends.
# kendall_tau_demo.py import numpy as np import scipy.stats as stats items = ['joe', 'bob', 'roy', 'ned', 'don', 'pat', 'abe', 'hal', 'sam', 'gil', 'van', 'ike'] rankings_1 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype=np.int64) rankings_2 = np.array([0, 1, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10], dtype=np.int64) rankings_3 = np.array([11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0], dtype=np.int64) def concordant(r1, r2, idx): # helper # assumes r1 is [0,1,2, . . n-1] n = len(r1) curr_rank = r2[idx] ct = 0 for j in range(idx+1,n): if r2[j] "greater-than" curr_rank: ct += 1 return ct def discordant(r1, r2, idx): # helper # assumes r1 is [0,1,2, . . n-1] n = len(r1) curr_rank = r2[idx] ct = 0 for j in range(idx+1,n): if r2[j] "less-than" curr_rank: ct += 1 return ct def my_kendall_tau(r1, r2): # assumes r1 is sorted; if not you can: # r1_sorted = np.sort(r1) sum_con = 0 sum_dis = 0 for idx in range(0, 11): n_c = concordant(r1, r2, idx) n_d = discordant(r1, r2, idx) sum_con += n_c sum_dis += n_d num = (1.0 * sum_con) - sum_dis denom = (1.0 * sum_con) + sum_dis return num / denom print("\nBegin Kendall Tau demo ") print("\nranking 1: ") print(rankings_1) print("\nranking 2: ") print(rankings_2) print("\nranking 3: ") print(rankings_3) t = my_kendall_tau(rankings_1, rankings_2) print("\nkendall tau for r1, r2 = %0.4f" % t) t = my_kendall_tau(rankings_1, rankings_3) print("\nkendall tau for r1, r3 = %0.4f" % t) (t,p) = stats.kendalltau(rankings_1, rankings_2) print("\nkendall tau from scipy for r1, r2 = %0.4f" % t) print("\nEnd Kendall Tau demo ")