Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Knapsack of Geotools: More Than Just Google M...

A Knapsack of Geotools: More Than Just Google Maps

Where? As location becomes increasingly important, and as more and more data is geotagged, this may be the most important question your app needs to answer. How do you determine what city and country your users are coming from? Figure out which neighborhood a place is in? Keep a location history for a physical object? Group people together based on proximity? One of these days you'll need to reach into your knapsack of geo-tools to solve problems like these and this talk aims to make you ready. We'll cover using location-aware storage like MongoDB and ElasticSearch, GeoIP, reverse geocoding, third-party location web services, geo-hashing, and more.

Andrei Zmievski

March 01, 2012
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. A KNAPSACK OF GEOTOOLS more than just Google Maps Andrei

    Zmievski • ConFoo • Mar 1, 2012 Thursday, March 1, 12
  2. MOI ❖ Andrei Z. ❖ Architect at AppDynamics ❖ PHP

    core dev, Smarty, PHP-GTK, Unicode project ❖ Coding, beer, brewing, photography, travel ❖ @a Thursday, March 1, 12
  3. INTRO ❖ Location is more important than ever ❖ So

    is basic understanding of geo-related principles ❖ Survey of tools, services, and technologies ❖ Not an exhaustive reference Thursday, March 1, 12
  4. LATITUDE/LONGITUDE ❖ Lines of equal latitude are called parallels ‣

    0º parallel = equator ‣ 23º26’N parallel = Tropic of Cancer ‣ 23º26’S parallel = Tropic of Capricorn ❖ Lines of equal latitude are called meridians ‣ 0° meridian = Prime Meridian ‣ Antipodes of Prime Meridian = 180°W and 180°E Thursday, March 1, 12
  5. LATITUDE/LONGITUDE ❖ Measured in DMS or decimal degrees ❖ Latitude:

    37° 45' 35” = 37.76° ❖ Longitude: -122° 28' 11 = -122.47° ❖ Positive latitude is N(orth), negative is S(outh) ❖ Positive longitudes are E(ast) of 0°, negative are W(est) Thursday, March 1, 12
  6. LATITUDE/LONGITUDE ❖ Latitude + longitude is enough to specify any

    location on the planet ❖ Does not consider height/depth Thursday, March 1, 12
  7. LATITUDE/LONGITUDE ❖ Earth != sphere ❖ Earth = oblate spheroid

    ❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar) Thursday, March 1, 12
  8. LATITUDE/LONGITUDE ❖ 1” of latitude = 30.7 m ❖ 1”

    of longitude varies with latitude latitude town degree minute second 60° Saint-Petersburg 55.65 km 0.93 km 15.42 m 51.47° Greenwich 69.29 km 1.15 km 19.24 m 45° Bordeaux 78.70 km 1.31 km 21.86 m 30° New Orleans 96.39 km 1.61 km 26.77 m 0° Quito 111.3 km 1.86 km 30.92 m (On the GRS80 or WGS84 spheroid at sea level) Thursday, March 1, 12
  9. LATITUDE/LONGITUDE ❖ Computer representations: ‣ “37.253,-122.0139” ‣ [37.253,-122.0139] ‣ {lat:

    37.253, long:-122.0139} ❖ Historically, lat,long ❖ GeoJSON specifies long,lat because in geometry x-axis comes first Thursday, March 1, 12
  10. GOTCHAS ❖ International Date Line discontinuity ❖ Latitudes smoothly wrap

    at the poles: from < 90° over the pole and back to < 90°, then down to -90°, over the pole, and back to > -90° ❖ Longitude has a discontinuity: 180°E (+180°) turns into 180°W (-180°W) ❖ This should be taken into account when making bounding box and other calculations Thursday, March 1, 12
  11. DISTANCE ❖ A bit more complicated than the normal Pythagorean

    planar distance formula ❖ All angle measurements are in radians, not degrees rad = deg ⇡ 180 Thursday, March 1, 12
  12. HAVERSINE ❖ Calculates the great-circle (orthodrome) distance ❖ Well-conditioned even

    for small distances a = sin 2 ✓ lat 2 ◆ + cos ( lat1) cos ( lat2) sin 2 ✓ long 2 ◆ c = 2 atan 2( p a, p 1 a ) d = R · c Thursday, March 1, 12
  13. SPHERICAL LAW OF COSINES ❖ These days IEEE 754 64-bit

    floating-point numbers provide 15 significant figures of precision ❖ Enough to use a simple trigonometry law ❖ Gives well-conditioned results down to 1 meter d = acos ( sin ( lat1) sin ( lat2) + cos ( lat1) cos ( lat2) cos ( long )) · R Thursday, March 1, 12
  14. VINCENTY FORMULA ❖ Does not assume spherical Earth ❖ Precision

    = 0.5 mm! ❖ Iterative calculation ❖ Explanation and JS implementation: ❖ http://www.movable-type.co.uk/scripts/latlong- vincenty.html Thursday, March 1, 12
  15. MIDPOINT ❖ Averaging is approximate at distances < 400 km

    ❖ Geographic midpoint method Thursday, March 1, 12
  16. 1.Convert lat/long to radians 2.Convert lat/long to Cartesian coordinates MIDPOINT

    lat1 = lat1 ⇡ 180 lon1 = lon1 ⇡ 180 X1 = cos ( lat1) cos ( lon1) Y1 = cos ( lat1) sin ( lon1) Z1 = sin ( lat1) Thursday, March 1, 12
  17. 3.Compute weighted average MIDPOINT x = 1 n n X

    1 Xn = X1 + X2 + . . . + Xn n y = 1 n n X 1 Yn = Y1 + Y2 + . . . + Yn n z = 1 n n X 1 Zn = Z1 + Z2 + . . . + Zn n Thursday, March 1, 12
  18. 4.Convert Cartesian back into latitude/longitude 5.Convert back to degrees MIDPOINT

    lon = atan 2( y, x ) hyp = p x 2 + y 2 lat = atan 2( z, hyp ) lat = lat 180 ⇡ lon = lon 180 ⇡ Thursday, March 1, 12
  19. ❖ At small distances (and closer to equator), good enough

    ‣ i.e. 2 km east + 5 km north = 5.38 km from origin ‣ but at 100x distances the error increases ❖ Allows for simpler calculations ❖ But not correct, depends on projection EUCLIDIAN GEOMETRY Thursday, March 1, 12
  20. MANY OTHERS ❖ Universal Transverse Mercator (UTM) ❖ Military Grid

    Reference System (MGRS) ❖ World Geographic Reference System (GEOREF) Thursday, March 1, 12
  21. GEOHASH ❖ Hierarchical spatial data structure which subdivides space into

    grid buckets ❖ Uses clever interleaved encoding scheme ❖ 9q8yyy is 37.78, -122.4 ❖ 9q8yyyd3b11 is 37.78504 -122.39559 Thursday, March 1, 12
  22. ADVANTAGES ❖ arbitrary precision, gradual degradation ❖ efficient encoding geohash

     length lat  bits long  bits km  error 1 2 3 ±2500 2 5 5 ±630 3 7 8 ±78 4 10 10 ±20 5 12 13 ±2.4 6 15 15 ±0.61 7 17 18 ±0.076 8 20 20 ±0.019 Thursday, March 1, 12
  23. ADVANTAGES ❖ easily shareable ❖ denotes an area of arbitrary

    size ❖ can be used for a simple version of clustering Thursday, March 1, 12
  24. LIMITATIONS ❖ hard to determine adjacency ❖ points close to

    each other can be in different cells ❖ not good for distance calculations ❖ projection-based model: given prefix length describes a much different region size near the equator than near the pole Thursday, March 1, 12
  25. TEXTUAL ❖ Point of interest ❖ Address ❖ Zip code

    ❖ Neighborhood ❖ City/country name Thursday, March 1, 12
  26. OLD TECH ❖ Traditional maps ❖ Points of interest ❖

    Celestial navigation, aka astronavigation Thursday, March 1, 12
  27. NEW TECH ❖ IP address ❖ GPS/GLONASS satellites (< 10

    m) ❖ Cell tower ID (200 m - 32 km accuracy) ❖ WiFi base stations (< 100 m) ❖ Ping times to well known servers ‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/ Thursday, March 1, 12
  28. IMPLEMENTATIONS ❖ GPS devices ‣ accurate, but slower to acquire

    signal ‣ especially in cities, with poor signal conditions ❖ Mobile (iPhone/Android/other) ‣ mostly uses A-GPS ‣ use network to ask an assistance server for GPS satellite data Thursday, March 1, 12
  29. IMPLEMENTATIONS ❖ Browser-based ‣ API uses Location Information Servers ‣

    Common sources of location information: IP address, Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc ‣ Well-supported by modern browsers Thursday, March 1, 12
  30. BROWSER GEOLOCATION if  (navigator.geolocation)  {   navigator.geolocation.getCurrentPosition(      

    function  (position)  {  /*  do  something  */  },     function  (error)  //  error  callback     {       switch(error.code)         {         case  error.TIMEOUT:           break;         case  error.POSITION_UNAVAILABLE:           break;         case  error.PERMISSION_DENIED:           break;         case  error.UNKNOWN_ERROR:           break;       }     });   } } else  //  no  geolocation Thursday, March 1, 12
  31. MAXMIND DB ❖ Most comprehensive IP positioning DB ‣ GeoLite

    City: free, decent coverage, less accurate, monthly updates ‣ GeoIP City: $370 initial, $90/month, weekly updates, more coverage, better accuracy Thursday, March 1, 12
  32. COVERAGE ❖ GeoLite City — over 99.5% on a country

    level and 79% on a city level for the US within a 25 mile radius country correct incorrect United  States 78% 17% Canada 81% 18% Kazakhstan 84% 18% Thursday, March 1, 12
  33. MAXMIND DB ❖ GeoIP C library ❖ Used by PHP,

    Python, etc ‣ http://pecl.php.net/package/geoip (older) ‣ https://github.com/Zakay/geoip (newer) ‣ http://www.maxmind.com/app/python (older) ‣ https://github.com/appliedsec/pygeoip (newer) Thursday, March 1, 12
  34. GEOIP IN PHP print_r(geoip_record_by_name('67.160.202.223')); Array ( [continent_code] => NA [country_code]

    => US [country_code3] => USA [country_name] => United States [region] => CA [city] => San Francisco [postal_code] => [latitude] => 37.764499664307 [longitude] => -122.42939758301 [dma_code] => 807 [area_code] => 415 ) Thursday, March 1, 12
  35. GEOIP IN PHP print_r(geoip_record_by_name('84.92.229.1')); Array ( [continent_code] => EU [country_code]

    => GB [country_code3] => GBR [country_name] => United Kingdom [region] => H9 [city] => London [postal_code] => [latitude] => 51.500198364258 [longitude] => -0.1262000054121 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12
  36. GEOIP IN PHP print_r(geoip_record_by_name('124.28.8.8')); Array ( [continent_code] => AS [country_code]

    => KR [country_code3] => KOR [country_name] => Korea, Republic of [region] => 13 [city] => Bucheon [postal_code] => [latitude] => 37.498901367188 [longitude] => 126.78309631348 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12
  37. GEOIP IN PHP print_r(geoip_record_by_name('28.8.8.8')); Array ( [continent_code] => NA [country_code]

    => US [country_code3] => USA [country_name] => United States [region] => [city] => [postal_code] => [latitude] => 38 [longitude] => -97 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12
  38. COMPARISON ❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1 ‣ Country : UK ‣ State/Province :

    ENGLAND ‣ City : SHEFFIELD ‣ Zip or postal code : - ‣ Latitude : 53.383055 ‣ Longitude : -1.464795 Thursday, March 1, 12
  39. ALSO ❖ One PHP library to rule them all ❖

    http://geocoder-php.org/ ❖ Unified interface for a variety of IP-Based geocoding providers ❖ And more Thursday, March 1, 12
  40. LESSONS ❖ Expect failures and word/work around them ❖ Don’t

    assume REMOTE_ADDR is correct ‣ Might be a proxy ‣ Check X-Forwarded-For header Thursday, March 1, 12
  41. CONVERSIONS ❖ Geocoding: address ➔ latitude & longitude ❖ Reverse

    geocoding: latitude & longitude ➔ address ❖ Not always reversible, e.g. “Farallon Islands” ❖ Disparity of results: ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=37.759947,-122.46866&sensor=true ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=45.499148,-73.566488&sensor=true Thursday, March 1, 12
  42. SERVICES/LIBRARIES ❖ JS: Google/Bing/MapQuest APIs ❖ PHP: http://geocoder-php.org/ ❖ Python:

    http://code.google.com/p/geopy/ ❖ Ruby: http://highearthorbit.com/geocommons-open- sourced-geocoder/ ❖ Check ToS before launching publicly Thursday, March 1, 12
  43. LOCATION EXTRACTION ❖ Location is not always given in a

    structured format ❖ Blog posts, news clippings, status updates may contain embedded mentions of locations ❖ Goal is to extract these with as much precision as possible Thursday, March 1, 12
  44. LOCATION EXTRACTION ❖ Yahoo! PlaceMaker does this ❖ Disambiguates found

    locations and returns unique IDs (WOEIDs) ‣ "New York", "New York City", "NYC", and "the Big Apple" are all variant names for WOEID 2459115 ❖ These can be used with Yahoo GeoPlanet Thursday, March 1, 12
  45. MONGO DB ❖ 2D geospatial indexes ❖ Location is an

    object or array at least 2 elements, which are interpreted as coordinates ❖ Implementation via geohashes on top of B-trees ❖ Supports multiple locations per document ❖ Uses GeoJSON spec - [long,lat] Thursday, March 1, 12
  46. MONGO DB ❖ Creating ❖ Querying db.places.ensureIndex({loc:  "2d"}) db.places.save({loc:  [-­‐91.23,

     28.25]}) db.places.find({loc:  {$near:  [-­‐91,28],  $maxDistance:  5}}) db.runCommand({geoNear:  "places",  near:  [-­‐91,28],  num:10}) Thursday, March 1, 12
  47. MONGO DB ❖ Supports Euclidian (default) and spherical models var

     earthRadius  =  6378;  /*  km  */ var  range  =  30  /  earthRadius;  /*  to  radians  */ db.places.find({loc:  {$nearSphere:  [20,50],  $maxDistance:  range}}) distances  =  db.runCommand({geoNear:  "places",  near:  [20,  50],          spherical:  true,  maxDistance:  range}).results; pointDistance  =  distances[0].dis  *  earthRadius  //  back  to  km Thursday, March 1, 12
  48. MONGO DB ❖ Bounded queries: circle, bounding box, polygon (>=

    1.9) ❖ Note that bounding box is specified via bottom left/top right corners box  =  [[-­‐73.99756,  40.73083],  [-­‐73.988135,  40.741404]] db.places.find({loc:  {"$within":  {"$box"  :  box}}}) Thursday, March 1, 12
  49. MONGO DB ❖ Limitations ‣ only 1 geospatial index per

    collection ‣ some query limitations with multi-location docs ‣ somewhat awkward and inconsistent query syntax ‣ sharding on geo keys doesn’t quite work yet Thursday, March 1, 12
  50. MYSQL ❖ Based on the OpenGIS model ❖ Supports points,

    lines, polygons ❖ Implemented using R-trees Thursday, March 1, 12
  51. MYSQL ❖ Creating ❖ Inserting data CREATE  TABLE  geom  (g

     GEOMETRY  NOT  NULL,  SPATIAL  INDEX(g))   ENGINE=MyISAM; INSERT  INTO  geom  VALUES  (GeomFromText('POINT(1  1)')) or //  not  OpenGIS  standard INSERT  INTO  geom  VALUES(Point(1,1)); Thursday, March 1, 12
  52. MYSQL ❖ Querying ❖ Only MyISAM tables support SPATIAL indices

    ❖ Sphinx search engine supports geo-distance search as well SELECT  MBRContains(   GeomFromText(  'POLYGON((0  0,0  3,3  3,3  0,0  0))'  ),   coord )  from  geom; Thursday, March 1, 12
  53. ELASTICSEARCH ❖ Mapping type geo_point ❖ Implemented via geohashes ❖

    Takes string, array, or object, or geohash representations on insert ❖ Uses GeoJSON ordering Thursday, March 1, 12
  54. ELASTICSEARCH ❖ Mapping specification {        "mydoc"  :

     {                "properties"  :  {                        "location"  :  {                                "type"  :  "geo_point"                        }                }        } } Thursday, March 1, 12
  55. ELASTICSEARCH ❖ Inserting curl  -­‐XPUT  localhost:9200/myindex/mydoc/1  -­‐d'{      "location"

     :  "-­‐83,47" }' curl  -­‐XPUT  localhost:9200/myindex/mydoc/2  -­‐d'{      "location"  :  {"lat":  47,  "lon":  -­‐83} }' Thursday, March 1, 12
  56. ELASTICSEARCH ❖ Can query by distance, bounding box, or polygon

    ❖ Does not support rich OpenGIS models, lines ❖ Bounding box is specified via top left/bottom right corners Thursday, March 1, 12
  57. ELASTICSEARCH {        "filtered"  :  {    

               "query"  :  {                        "match_all"  :  {}                },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "200km",                                "pin.location"  :  {                                        "lat"  :  40,                                        "lon"  :  -­‐70 }}}}} Thursday, March 1, 12
  58. ELASTICSEARCH ❖ geo + text = power ‣ Find all

    businesses with ‘pizza delivery’ in the description within 5 miles from my location and segment the results 1, 2, and 5 miles Thursday, March 1, 12
  59. POSTGRES ❖ PostGIS = spatial extensions to Postgres ❖ Follows

    OpenGIS standard ❖ Backend spatial database for geographic information systems (GIS) Thursday, March 1, 12
  60. DATASETS ❖ http://freegisdata.rtwilson.com/ ❖ Neighborhoods ‣ http://www.zillow.com/howto/api/neighborhood-boundaries.htm ❖ Streets ‣

    http://www.census.gov/geo/www/tiger/ ‣ OpenStreetMap ❖ Other ‣ Flickr shapefiles ‣ Yahoo GeoPlanet data Thursday, March 1, 12
  61. APIS/SERVICES ❖ SimpleGeo -> UrbanAirship ❖ Web Maps Studio ❖

    GeoAPI ❖ POIs: Google Places, Foursquare, Factual Thursday, March 1, 12
  62. MAPPING ❖ Mapbox, CartoDB, TileMill, OSM, Mapnik, Leaflet, Polymaps ❖

    Route planner: ‣ http://graphserver.github.com/graphserver/ Thursday, March 1, 12