The Kendall Tau Distance For Permutations Example Python Code

Suppose you have two mathematical permutations such as p1 = (2,0,3,1) and p2 = (0,2,1,3) and you want to measure the distance / difference between the permutations. This is a surprisingly tricky question. One approach is to compute the Kendall Tau (KT) distance.

The KT distance is the number of pairs of permutation elements that have a different ordering in the two permutations. The idea is best explained by example. For p1 and p2 above, because each permutation has n=4 elements, there are n * (n-1) / 2 = 6 possible pairs of elements. Using the elements of p1 as the basis the pairs are: 2-0, 2-3, 2-1, 0-3, 0-1, 3-1. In p2, the 2-0 pair order is reversed (0-2 in p2), and the 3-1 pair is reversed (1-3 in p2), but the other four pairs have the same ordering, and so the raw KT distance is 2. The normalized KT distance is 2/6 = 0.3333 where 6 is the total number of pairs. This can be interpreted as 33.33% of the pairs of elements in p1 and p2 have a different order.


The Wikipedia article on KT distance is weak (at least when I wrote this post), but the Wikipedia article does present a good example (on the right). I suspect/hope Wikipedia will correct the article at some point.

OK. I don’t like to criticize. But at the time I wrote this post, the Wikipedia entry on Kendal Tau distance is one of the weakest technical Wikipedia articles I’ve come across in recent memory. The KT distance article is often misleading, is contradictory in several places, and has obvious errors. The best part about the Wikipedia article is an example where it shows how to compute KT distance for 1-based permutations p1 = (1,2,3,4,5) and p2 =(3,4,1,2,5).

The Wikipedia article presents a Python implementation of KT distance. The code computes raw distance incorrectly (returning twice as many mis-orderings as correct) but then computes a correct value of normalized distance by dividing by twice as many possible orderings!

The Wikipedia article also mixes information about Kendall Tau correlation (a value between -1 and +1) with normalized KT distance (a value between 0 and 1) — similar concepts but different. Additionally, the Wikipedia article presents some C language code that is misleading because it computes KT distance that works only for a special case when one of the permutations is (0,1,2,3,..) or (1,2,3,..).

Here’s my Python implementation of KT distance:

def my_kendall_tau_dist(p1, p2):
  # p1, p2 are 0-based lists or np.arrays permutations
  n = len(p1)
  index_of = [None] * n  # lookup into p2
  for i in range(n):
    v = p2[i]; index_of[v] = i

  d = 0  # raw distance = number pair mis-orderings
  for i in range(n):  # scan thru p1
    for j in range(i+1, n):
      if index_of[p1[i]] "gt" index_of[p1[j]]:  # replace "gt"
        d += 1
  normer = n * (n - 1) / 2.0  # total num pairs 
  nd = d / normer  # normalized distance
  return (d, nd) 

The KT distance for permutations is somewhat unusual in the sense that it’s easy to state and understand, but the implementation has many subtleties.

One moral of the story is that Wikipedia is a great resource but you shouldn’t assume the information there is always correct.



Poker and card games are all about permutations. Some people can toss playing cards with amazing skill. My college roommate Ed Koolish was an expert card thrower. Center: The current world record for farthest card thrown is held by Rick Smith who threw a card 216 feet, 4 inches on December 2, 2002. Here Smith throws cards at a target. Left and Right: I don’t think there are any competitions for card tossing style.


Demo code:

# kendall_tau_dist_demo.py

# kendall tau distance for permutations
# kt distance is not same as kt correlation
# replace "lt" and "gt" with symbols

import numpy as np

def my_kendall_tau_dist(p1, p2):
  # p1, p2 are 0-based lists or np.arrays permutations
  n = len(p1)
  index_of = [None] * n  # lookup into p2
  for i in range(n):
    v = p2[i]; index_of[v] = i

  d = 0  # raw distance = number pair mis-orderings
  for i in range(n):  # scan thru p1
    for j in range(i+1, n):
      if index_of[p1[i]] "gt" index_of[p1[j]]:  # replace "gt"
        d += 1
  normer = n * (n - 1) / 2.0  # total num pairs 
  nd = d / normer  # normalized distance
  return (d, nd)  

def normalised_kendall_tau_distance(values1, values2):
  # from Wikipedia -- weird and inefficient
  n = len(values1)
  assert len(values2) == n, \
    "Both lists have to be of equal length"
  i, j = np.meshgrid(np.arange(n), np.arange(n))
  a = np.argsort(values1)
  b = np.argsort(values2)
  ndisordered = \
    np.logical_or(np.logical_and(a[i] "lt" a[j], \
      b[i] "gt" b[j]), np.logical_and(a[i] "gt" a[j], \
        b[i] "lt" b[j])).sum()
  normed = ndisordered / (n * (n-1))  # no 2
  return ndisordered, normed 
  
print("\nBegin kendall tau distance demo \n")

p1 = [0,3,1,6,2,5,4]  # list 
p2 = [4,5,2,6,1,3,0]  # reversed p1
p3 = [1,3,2,6,5,4,0]

print("p1 = ", end=""); print(p1)
print("p2 = ", end=""); print(p2)
print("p3 = ", end=""); print(p3)

d1, nd1 = normalised_kendall_tau_distance(p1, p2)
d2, nd2 = my_kendall_tau_dist(p1, p2)
d3, nd3 = my_kendall_tau_dist(p1, p2)
d4, nd4 = my_kendall_tau_dist(p2, p3)
d5, nd5 = my_kendall_tau_dist(p1, p1)

print("\np1-p2 dist and norm dist using Wikipedia kt(): \
  %3d  %0.4f " % (d1,nd1))
print("p1-p2 dist and norm dist using my_kt_dist()  : \
  %3d  %0.4f " % (d2,nd2))
print("p1-p3 dist and norm dist using my_kt_dist()  : \
  %3d  %0.4f " % (d3,nd3))
print("p2-p3 dist and norm dist using my_kt_dist()  : \
  %3d  %0.4f " % (d4,nd4))
print("p1-p1 dist and norm dist using my_kt_dist()  : \
  %3d  %0.4f " % (d5,nd5))

print("\nWikipedia 1-based permutations example: ")
p1 = np.array([1,2,3,4,5], dtype=np.int64)  # np.array
p2 = np.array([3,4,1,2,5], dtype=np.int64)
print("p1 = ", end=""); print(p1)
print("p2 = ", end=""); print(p2)
p1 -= 1  # convert to 0-based
p2 -= 1
(d, nd) = my_kendall_tau_dist(p1, p2)
print("normalized KT distance = %0.4f " % nd)

print("\nEnd demo ")
This entry was posted in Machine Learning. Bookmark the permalink.

2 Responses to The Kendall Tau Distance For Permutations Example Python Code

  1. sau001 says:

    Very interesting algorithm. There is a certain intuition as well.
    What might be some practical applications?
    Thank you

    • I am working on an idea called quantum inspired annealing. The goal is to optimize a permutation. Part of the quantum inspired part is that you need to determine the distance between two permutations.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s