Computing the distance between two zip codes is easy. And it’s difficult.

To compute the distance between two zip codes, you can find the latitude and longitude of each zip code, then compute the distance between the two lat-lon points.

The main difficulty is that a zip code can cover a very large geographical area and so the lat-lon location of a zip code is very fuzzy.

I coded up a short demo using Python. First, I went to the U.S. Postal Service web site, looking for a database of zip codes. As is often the case with government web sites, the site was absolutely horrible. And no database was available.

Because of this, there are several companies that sell USPS data. I found a nice commercial site at unitedstateszipcodes.org where there was a free version to download. The data was an Excel spreadsheet, which I saved as a tab-delimited text file. There were 15 columns, but I only needed the zip code in column [0] and the lat and lon in columns [12] and [13].

To compute the distance between a pair of lat-lon values, I used the haversine formula, also called the great-circle distance, which takes the curvature of the Earth into account.

Good fun!

*When I was a young man, I worked as an Assistant Cruise Director on ships of the Royal Viking Line. I traveled thousands of miles (as measured by haversine or any other formula).*

*Left: There were two assistant directors on each voyage. One of our jobs was to manage the entertainment. Here I am on one cruise with the other assistant, Peter, and two of the entertainers. They were very talented. Center: Royal Viking had cruises in all part of the world and so I got to see over 40 countries. When the ship would dock at a port, the cruise staff were expected to go ashore to keep an eye on the passengers. Egypt was a highlight. Right: On every trip, one night was the formal Captain’s Dinner. I’m here with the captain and his wife on one such evening. From my tan, I’m pretty sure the photo is from a Mediterranean cruise. From the amount of hair on my head, I’m pretty sure this is a very old photo.*

# zip_code_dist.py # data is free version of product from: # www.unitedstateszipcodes.org/zip-code-database/ import numpy as np # --------------------------------------------------------- def haversine(lat1, lon1, lat2, lon2): # from wikipedia.org/wiki/Great-circle_distance lat1 = np.radians(lat1); lon1 = np.radians(lon1) lat2 = np.radians(lat2); lon2 = np.radians(lon2) dlon = lon2 - lon1 dlat = lat2 - lat1 a = np.sin(dlat/2.0)**2 + np.cos(lat1) * \ np.cos(lat2) * np.sin(dlon/2.0)**2 c = 2.0 * np.arcsin(np.sqrt(a)) r = 6371.0 # approx. radius Earh in km return c * r # --------------------------------------------------------- print("\nApproximate distance between two zip codes \n") fin = open(".\\zip_code_database.txt", "r") table = dict() fin.readline() # consume header for line in fin: # load lookup table tokens = line.split('\t') zc = tokens[0] lat = np.float32(tokens[12]) lon = np.float32(tokens[13]) table[zc] = (lat,lon) fin.close() zc1 = "98029" zc2 = "98052" (lat1,lon1) = table[zc1] (lat2,lon2) = table[zc2] dist = haversine(lat1,lon1, lat2,lon2) print("Distance between " + zc1 + " and " + zc2 + " is: ") print("%0.2f km" % dist) print("\nEnd demo ")

Nice work. I was not aware about this algorithm. Thank you