Suppose you have two mathematical permutations such as p1 = (2,0,3,1) and p2 = (0,2,1,3) and you want to measure the distance / difference between the permutations. This is a surprisingly tricky question. One approach is to compute the Kendall Tau (KT) distance.

The KT distance is the number of pairs of permutation elements that have a different ordering in the two permutations. The idea is best explained by example. For p1 and p2 above, there are 6 possible pairs of elements. Using the elements of p1 as the basis the pairs are: 2-0, 2-3, 2-1, 0-3, 0-1, 3-1. In p2, the 2-0 pair order is reversed (0-2 in p2), and the 3-1 pair is reversed (1-3 in p2), but the other four pairs have the same ordering, and so the KT distance is 2. The normalized KT distance is 2/6 = 0.6667 where 6 is the total number of pairs.

*The Wikipedia article on KT distance is awful, but the Wikipedia article does present a good example (on the right).*

OK. I don’t like to criticize. But the Wikipedia entry on Kendal Tau distance is one of the weakest technical articles I’ve come across . . ever. The KT article is often misleading, is contradictory in several places, and has obvious errors. The only good part about the Wikipedia article is an example where it shows how to compute KT distance for 1-based permutations p1 = (1,2,3,4,5) and p2 =(3,4,1,2,5).

The Wikipedia article presents a Python implementation of KT distance. The code computes distance incorrectly (returning twice as many mis-orderings as correct) but then computes a correct value of normalized distance by dividing by twice as many possible orderings!

The Wikipedia article also mixes information about Kendall Tau correlation (a value between -1 and +1) with normalized KT distance (a value between 0 and 1) — similar concepts but different. Additionally, the Wikipedia article presents some C language code that is misleading because it computes KT distance that works only for a special case when one of the permutations is (0,1,2,3,..) or (1,2,3,..).

Here’s my Python implementation of KT distance:

def my_kendall_tau_dist(x, y): # x, y are 0-based lists or np.arrays n = len(x) index_of = [None] * n # lookup into y for i in range(n): v = y[i]; index_of[v] = i d = 0 # distance = number pair misorderings for i in range(n): for j in range(i+1, n): if index_of[x[i]] "gt" index_of[x[j]]: # replace "gt" d += 1 normer = n * (n - 1) / 2.0 nd = d / normer # normalized distance return (d, nd)

The KT distance is unusual in the sense that it’s easy to state and understand, but the implementation has many subtleties.

The moral of the story is that Wikipedia is a great resource but you shouldn’t assume the information there is always correct.

*Poker and card games are all about permutations. Some people can toss playing cards with amazing skill. My college roommate Ed Koolish was an expert card thrower. Center: The current world record for farthest card thrown is held by Rick Smith who threw a card 216 feet, 4 inches on December 2, 2002. Here Smith throws cards at a target. Left and Right: I don’t think there are any competitions for card tossing style.*

Demo code:

# kendall_tau_dist_demo.py # kendall tau distance for permutations # kt distance is not same as kt correlation # replace "lt" and "gt" with symbols import numpy as np def my_kendall_tau_dist(x, y): # number bubble sort swaps = number misordered pairs # x, y are 0-based lists or np.arrays n = len(x) index_of = [None] * n # lookup into y for i in range(n): v = y[i]; index_of[v] = i d = 0 # distance = number pair misorderings for i in range(n): for j in range(i+1, n): if index_of[x[i]] "gt" index_of[x[j]]: d += 1 normer = n * (n - 1) / 2.0 nd = d / normer # normalized distance return (d, nd) def normalised_kendall_tau_distance(values1, values2): # from Wikipedia n = len(values1) assert len(values2) == n, \ "Both lists have to be of equal length" i, j = np.meshgrid(np.arange(n), np.arange(n)) a = np.argsort(values1) b = np.argsort(values2) ndisordered = \ np.logical_or(np.logical_and(a[i] "lt" a[j], \ b[i] "gt" b[j]), np.logical_and(a[i] "gt" a[j], \ b[i] "lt" b[j])).sum() normed = ndisordered / (n * (n-1)) # no 2 return ndisordered, normed print("\nBegin kendall tau distance demo \n") x = [0,3,1,6,2,5,4] # list y = [4,5,2,6,1,3,0] # reversed x z = [1,3,2,6,5,4,0] print("x = ", end=""); print(x) print("y = ", end=""); print(y) print("z = ", end=""); print(z) d1, nd1 = normalised_kendall_tau_distance(x, y) d2, nd2 = my_kendall_tau_dist(x, y) d3, nd3 = my_kendall_tau_dist(x, z) d4, nd4 = my_kendall_tau_dist(y, z) d5, nd5 = my_kendall_tau_dist(x, x) print("\nxy dist and norm dist using Wikipedia kt(): \ %3d %0.4f " % (d1,nd1)) print("xy dist and norm dist using my_kt_dist() : \ %3d %0.4f " % (d2,nd2)) print("xz dist and norm dist using my_kt_dist() : \ %3d %0.4f " % (d3,nd3)) print("yz dist and norm dist using my_kt_dist() : \ %3d %0.4f " % (d4,nd4)) print("xx dist and norm dist using my_kt_dist() : \ %3d %0.4f " % (d5,nd5)) print("\nWikipedia 1-based example: ") x = np.array([1,2,3,4,5], dtype=np.int64) # np.array y = np.array([3,4,1,2,5], dtype=np.int64) print("x = ", end=""); print(x) print("y = ", end=""); print(y) x -= 1 # convert to 0-based y -= 1 (d, nd) = my_kendall_tau_dist(x, y) print("norm distance = %0.4f " % nd) print("\nEnd demo ")

Very interesting algorithm. There is a certain intuition as well.

What might be some practical applications?

Thank you

I am working on an idea called quantum inspired annealing. The goal is to optimize a permutation. Part of the quantum inspired part is that you need to determine the distance between two permutations.