Geocoding with Geocoder.us and Google Maps

October 30, 2007 · Print This Article

Geocoding is the process of assigning geographic identifiers to map features — a specific example is assigning a latitude and longitude to a given street address. A common technique uses address interpolation. Using this method, if we know a street address and the endpoints of that street, we can interpolate the approximate location of the specific address.

The address information comes from the TIGER/Line files, which are extracts of selected geographic and cartographic information from the US Census Bureau’s TIGER (Topologically Integrated Geographic Encoding and Referencing) database.

So the task of a geocoder is to parse an address for street numbers, names, cities, states, and zip codes, and then interpolate the coordinates of that address by finding its endpoints in the dataset. I recently used two geocoders, Google Maps and Geocoder.us, and thought I’d share the results of my work along with free software that you can use to geocode your own addresses.

In a project that I’m working on, I have a need to geocode a large number of street address. Because this is something I have to do rather frequently, I decided to put together a utility that would read my address information, geocode it via a web service, and spit it back out in a format suitable for further manipulation.

Geocoding on Geocoder.us

Geocoder.us offers a free and a commercial geocoding service. The free service is free for non-commercial use only. The commercial service costs $50 for 20,000 lookups, with no charge for failed lookups. Pretty reasonable to stay legit. The only difference between the two appears to be speed. The quality of the results is the same, but there’s a dramatic improvement in speed if you’re using the commercial service.

Both services can be accessed via a web browser, or using REST-CSV, REST-RPC, SOAP, and XML-RPC. For my purposes, I used the REST-CSV interface because it provided everything I needed, and nothing I didn’t. Submitting a request to:

http://rpc.geocoder.us/service/csv?address=1600+Pennsylvania+Ave,+Washington+DC

yields:

38.898748,-77.037684,1600 Pennsylvania Ave NW,Washington,DC,20502


Geocoding using the Google Maps API

Google also offers free access to their service along with a commercial service. The commercial service is only available for a minimum of $10,000 per year in licensing fees, so I can’t speak to any of its advantages. The free service is remarkably fast and accurate, but can only be used for non-commercial purposes. Google used to set a 50,000 requests per API key limit per day, but that was recently changed to 15,000 requests per IP address per day, and they will throttle you if you submit too many requests too quickly. For example, when I submitted 5 requests per second, 1 of every 5 returned a 620 error (too many requests). To respect their wishes, I slowed requests to no faster than 1 per second in the geocoding utility that I created. Even at 1 per second, I get a 620 error at about every 100 requests.

Google’s geocodng service can be accessed via a REST interface, and you can specify the return format: csv, json, or xml. Csv returns a comma-delimited string, json a JSON object, and xml an XML file. Submitting a request to:

http://maps.google.com/maps/geo?q=1600+Pennsylvania+Ave,+Washington+DC&output=csv&key=apikey

yields:

200,6,42.730070,-73.690570

Where ‘200′ is the response code, ‘6′ is the accuracy, and the remaining values are latitude and longitude.

I used the XML return format in my utility because part of the returned document contained the parsed street address in a standardized form, which I found very helpful.

Comparison of the Two Services

I can’t compare Google’s commercial service to their free service, because I didn’t have $10,000 for a Google license, but the results from their free service were remarkably fast and accurate. I submitted five small geocoding batches via my utility, and Google accurately geocoded 97.5 percent of the addresses. About one-half of one percent were incorrectly geocoded, e.g. I entered an address in Raleigh, and the geocoded result was located in Wake Forest. The remaining errors were primarily due to the address not being found.

Geocoder.us’s results weren’t as accurate, but their commercial service is fast and inexpensive, and if you want to stay on the legal side, it’s the way to go. Geocoder.us accurately geocoded about 83 percent of the addresses, with two-thirds of the errors due to an unfound address, and one-third due to an ambiguous address. Geocoder’s free service was a bit slow, requiring over 30 minutes to geocode 150 address, whereas their commercial service completed the same work in just over 30 seconds.

Google seemed to do a better job with parsing and translating entered addresses into something that could be found in their database, which resulted in fewer ‘address not found’ errors, but also resulted in some incorrect coordinates instead of a failed request due to address ambiguity.

I should point out that my results are likely not typical of a typical geocoding task. I geocoded primarily business addresses, and many of those businesses were new, or had custom addresses. I’d be interested in seeing the accuracy comparison using residential addresses. If you’re up for it, I’ll be posting the tools to automate the whole process in the next few days.

Common Geocoding Problems

A few common problems were alluded to above:

  • New addresses might not be found in the dataset. The latest TIGER/Line dataset was released on March 6, 2007, and includes 2006 data.
  • Geocoder.us had problems parsing out irrelevant address information. For example, it couldn’t locate 123 Anystreet Suite 110, but could locate 123 Anystreet; it couldn’t locate 440-110 Anyavenue, but could locate 440 Anyavenue; it couldn’t locate 123 US Hwy 123, but it could locate 123 United States Hwy 123.
  • Ambiguous addresses will return multiple results, and some of those can’t be helped. For example 123 N Main Street and 123 Main Street.
  • Custom addresses are difficult to geocode. For example, Triangle Town Center Mall is located on Triangle Town Center Blvd.
  • Addresses without a street listing cannot be found. This would include P.O. boxes.


Utilities

I created two geocoders: one can be used with Geocoder.us’s service, and one can be used with Google’s service. You can open these up on your PC, point them to an input file, and sit back while they do the work of geocoding your addresses. I’ll post these to the Swirl Wiki, and do quick write-ups on them within the next few days. I have no idea if they’ll be useful to anyone or not, but feel free to use or distribute them however you like.

More on geocoding: Wikipedia

Comments

2 Responses to “Geocoding with Geocoder.us and Google Maps”

  1. Sal Khattak on May 11th, 2009 8:48 pm

    Where are the utilities you have written ? I could not locate them.

    Can you please e-mail me the location or the utilities ?

    Thanks,
    Sal

    sal@khattak.com

  2. Pseudo Nim on February 24th, 2010 5:38 am

    FYI, there is an open-source Perl module that runs against a berkeley DB loaded with the US Census data you mention. Its similar to GeoCoder.US running on your local network. It can geocode about 1-10M addresses per hour.

Got something to say?