Determining a Location from a Computer’s IP Address

On a couple of projects I’ve worked on in the past, I needed to determine the location of a computer based on its IP address. There are a few companies that sell data sets that contain such information. I used a company called Quova, which was acquired by Neustar in 2010 and renamed but I still call it Quova.

Every month Quova publishes a new data set that maps all IP addresses to location (because the information changes frequently). The data set is actually a text file with several million lines. Each line has about 29 fields, separated by the ‘|’ character. The first two fields are a start_ip and an end_ip. The remaining 27 fields on the line are things that map to all the IP address between the start_ip and the end_ip. Data fields include country, city, state, latitude and longitude, postal code, and so on.


To find the information associated with a particular target IP address, it’s not really feasible to simply loop through the data set text file one line at a time until you hit the target interval — the data set file is just too big. In principle, a good way to access the Quova information would be to transfer the data into a SQL database, index the start_ip and end_ip columns, and then do a select statement.

For one project I worked on, I needed to do lookups with data in memory instead of SQL. The problem is that the Quova data is too large to fit in a normal machine. The solution is to just load part of the data file (about 10% would fit on my machine) at any one time, and load a different chunk of data if necessary. This meant I had to create an index to know which lines of data had which IP addresses.

This entry was posted in Machine Learning. Bookmark the permalink.

2 Responses to Determining a Location from a Computer’s IP Address

  1. This is an interesting discussion. Is there more to your solution?

    • Not really. The code is quite long so I don’t intend to post it. The hard part is creating an index, that, if given an IP address, returns the block of lines numbers that contain the address. Nothing too outrageous but it did take me a coupled of days to write.

Comments are closed.