Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maps, Lies and Storytelling | P06 & P07| SQL & ...

Andrew W Hill
September 30, 2014

Maps, Lies and Storytelling | P06 & P07| SQL & CartoCSS

Andrew W Hill

September 30, 2014
Tweet

More Decks by Andrew W Hill

Other Decks in Education

Transcript

  1. SQL in CartoDB Can be viewed as the entry point

    for complex geospatial analysis, tool building, and data manipulation
  2. Common SQL Statements SELECT UPDATE INSERT DELETE * there are

    lots of that we wont get into ** these can (and often are) mixed
  3. SELECT cartodb_id, name FROM us_states The result of this query

    will have just the two columns I ask for, but for every row in my dataset cartodb_id name 1 new york 2 alabama etc
  4. Here, I can limit it to just one column. SELECT

    cartodb_id FROM us_states cartodb_id 1 2 etc Notice that the result contains the column name
  5. You can create an ‘alias’ for the result. Basically, change

    the name it uses for the result. SELECT cartodb_id AS my_id FROM us_states my_id 1 2 etc Notice that the results are the same, just a different name
  6. You can modify values on the fly using SQL, for

    example, simple arithmetic. SELECT cartodb_id + 4 AS id FROM us_states id 5 6 etc The result isn’t stored anywhere permanently, it is derived on the fly
  7. You can use a type of function, called an ‘Aggregate’

    that measures across all rows at once. SELECT min(cartodb_id) AS id FROM us_states id 1 Now, only 1 row is returned
  8. You can use a type of function, called an ‘Aggregate’

    that measures across all rows at once. SELECT sum(cartodb_id) FROM us_states sum 1328 Now, only 1 row is returned
  9. You can make up columns out of thin air. !

    Notice that andrew is single quoted. That is the standard for strings SELECT cartodb_id, ‘andrew’ FROM us_states notice that ‘andrew’ remains constant for all rows cartodb_id unknown 1 andrew 2 andrew etc
  10. Using WHERE statements will allow you to narrow in on

    exactly the rows you want SELECT cartodb_id FROM us_states WHERE cartodb_id = 1 cartodb_id 1 Here, only the single matching row is returned
  11. Using WHERE statements work on all types of data SELECT

    name FROM us_states WHERE name = ‘New York’ name New York See!
  12. When using string columns, they case sensitive SELECT name FROM

    us_states WHERE name = ‘new york’ name oh no! no results
  13. Using WHERE statements will allow you to narrow in on

    exactly the rows you want SELECT cartodb_id FROM us_states WHERE cartodb_id > 24 cartodb_id 25 26 etc
  14. You aren’t limited to using values as they exist, you

    can manipulate them on the fly SELECT cartodb_id FROM us_states WHERE cartodb_id*4=20 cartodb_id 5
  15. With strings, you can do things like case insensitive partial

    matching. Here, ‘%’ is a wildcard, and ILIKE says use the wild card and ignore case SELECT name FROM us_states WHERE name ILIKE ‘new%’ name New Hampshire New Jersey etc Nice!
  16. You can check if a value is in a set

    of options SELECT cartodb_id FROM us_states WHERE cartodb_id IN (1,4,99) name 1 4 Not 99 states
  17. The LIMIT restricts the number or rows you ask for

    SELECT cartodb_id FROM us_states WHERE cartodb_id > 24 LIMIT 1 cartodb_id 25 see!
  18. The LIMIT restricts the number or rows you ask for

    SELECT cartodb_id FROM us_states WHERE cartodb_id > 24 LIMIT 2 cartodb_id 25 26 see!
  19. The ORDER BY allows you to change the order of

    rows in your result SELECT cartodb_id FROM us_states WHERE cartodb_id > 24 ORDER BY cartodb_id cartodb_id 25 26 etc The default is determined by the order on disk. no noticeable change here
  20. ASC means, ascending. SELECT cartodb_id FROM us_states WHERE cartodb_id >

    24 ORDER BY cartodb_id ASC cartodb_id 25 26 etc Here, ascending is the same as what we saw before
  21. DESC means, descending SELECT cartodb_id FROM us_states WHERE cartodb_id >

    24 ORDER BY cartodb_id DESC cartodb_id 52 51 etc See!
  22. It works just as well on the alphabet SELECT name

    FROM us_states ORDER BY name DESC name Wyoming Wisconsin etc See!
  23. Let’s do something crazy here, substring() SELECT substring(name,1,3) FROM us_states

    substring Ala Ala Ari etc So we are asking for the first 3 characters of the state name
  24. We can add a GROUP BY, which will join all

    rows with the same results SELECT substring(name,1,2) FROM us_states GROUP BY substring(name,1,2) substring Ala Ari Ark etc The second ‘Ala’ disappeared from our results
  25. We can now use an aggregate that will work within

    each group result, not across the who table SELECT substring(name,1,2), count(*) FROM us_states GROUP BY substring(name,1,2) substring count Ala 2 Ari 1 Ark 1 etc Now we can see where that second ‘Ala’ went to!
  26. Let’s find out what the most common first three letters

    of state names is by including an ORDER BY SELECT substring(name,1,2), count(*) FROM us_states GROUP BY substring(name,1,2) ORDER BY count(*) DESC substring count New 4 Nor 2 Ala 2 etc So ‘New’ has 4, which we all experience from trying to quickly fill online address forms…
  27. Someone else has done a really nice job of describing

    and illustrating this, so I’m going to shamelessly reuse
  28. Here, I use a nested queries and compare against the

    results for a state name SELECT name, state FROM us_cities WHERE state IN (SELECT name FROM us_states WHERE name ILIKE ‘new y%’) name state New York New York Albany New York Buffalo New York etc
  29. You can also nest right in the column results! SELECT

    name, (SELECT population FROM us_census WHERE name = u.name) pop FROM us_cities u ORDER BY pop DESC name pop California 38000000 Texas 26000000 New York 20000000 etc
  30. You can use a WITH statement to treat a SQL

    result as a new table WITH fakestuff AS (SELECT name, state FROM us_cities WHERE name = ‘New York’) SELECT * FROM fakestuff name state New York New York Here, I create a magical new table- like object I call “fakestuff” When I select ‘*’ from the fakestuff table- ish thing, the result is only one row
  31. I use them to help organize my thinking in SQL

    WITH fakestuff AS (SELECT ST_Centroid(the_geom) AS the_geom, name, pop FROM us_states) SELECT * FROM fakestuff WHERE pop > 5000000 name state New York New York
  32. UPDATES work with all the same properties of SELECT but

    with a few differences. UPDATE us_states SET name = ‘New Amsterdam’ WHERE name = ‘New York’ For example, you will only get a confirmation result, not actual rows. (typically)
  33. SELECT the_geom FROM us_states In CartoDB, your geometries are always

    standardized and stored in a column called the_geom the_geom MULTIPOLYGON… MULTIPOLYGON… etc
  34. SELECT the_geom FROM us_states ORDER BY the_geom ASC You can

    manipulate, filter, order, measure and lots of other things to geometries the_geom MULTIPOLYGON… MULTIPOLYGON… etc
  35. SELECT the_geom FROM us_states WHERE ST_Intersects(the_geom, my_geometry) CartoDB uses PostGIS

    to enable most geospatial functions. These are the functions that start with ‘ST_…’ the_geom MULTIPOLYGON…
  36. SELECT the_geom FROM us_states WHERE ST_Intersects(the_geom, CDB_(42, -80)) We also

    have a couple of helper functions only in CartoDB, these start with CDB_. For example, CDB_LatLng(num, num) creates a geometry from two coordinates the_geom MULTIPOLYGON…
  37. SELECT the_geom FROM us_states WHERE ST_DWithin(the_geom, CDB_(42, -80), 1) ST_DWithin

    returns TRUE if the two geometries are within a minimum distance (supplied as the 3rd variable)
  38. SELECT the_geom FROM us_states WHERE ST_DWithin(the_geom, CDB_(42, -80), 1) In

    CartoDB, we rely on projections. Using functions where we supply a unit measurement, that unit is always in the unit of the projection. Here, 1 degree lat/lng. Somewhat meaningless.
  39. SELECT the_geom FROM us_states WHERE ST_DWithin(the_geom::geography, CDB_LatLng(42, -80)::geography, 10000) You

    can use a spherical globe and measure in meters by re-casting the geometries to geography types (on the fly)
  40. SELECT the_geom FROM us_states ORDER BY the_geom <-> CDB_LatLng(44,-80) You

    can ORDER BY the distance to a specified geometry really nicely in CartoDB using <->
  41. SELECT *, ST_Distance(the_geom, CDB_LatLng(42,-80)) AS d FROM us_states You can

    also use spatial measurements to derive a new or modified column result on the fly
  42. SELECT *, ST_Distance(the_geom, CDB_LatLng(42,-80)) AS d FROM us_states ST_Distance will

    give me back the distance between two geometries. Here, the geometry in every row of my table would be measured from the point 42, -80
  43. SELECT name, ST_MakeLine(the_geom, CDB_LatLng(42,-80)) AS the_geom FROM us_cities You can

    use ST_MakeLine to create lines between points on the fly. Here, I’ll turn every city point, into a line between that point and my point at 42, -80
  44. UPDATE us_cities SET the_geom = ST_MakeLine(the_geom, CDB_LatLng(42,-80)) AS the_geom You

    can use all the stuff in UPDATES to store results as well. If I ran this, my cities would no longer be points, but would be lines! Careful, there is no UNDO!
  45. SELECT name, ST_Centroid(the_geom) AS the_geom FROM us_states Likewise, we can

    turn polygons or lines into points. There are multiple ways to do this. ST_Centroid is one
  46. SELECT name, ST_Envelope(the_geom) AS the_geom FROM us_cities GROUP BY state

    Or you could turn points into a polygon. ST_Envelope will create the minimum box around a group of points
  47. the_geom the_geom_webmercator the_geom_webmercator When you create new data, it goes

    into, Behind the scenes, CartoDB also translates it to, When CartoDB draws a map, it comes from,
  48. the_geom the_geom_webmercator If you modify on the fly and want

    to map results, you need to create on the fly also!
  49. SELECT cartodb_id, the_geom, the_geom_webmercator FROM us_states ORDER BY the_geom <->

    CDB_LatLng(44,-80) You can do any filter you want, just make sure you include the_geom_webmercator in your column results to show it on the map
  50. SELECT cartodb_id, ST_Buffer(the_geom, 1), the_geom_webmercator FROM us_cities This would show

    the original points, but it would fail to include the transformed (buffered) point because the_geom hasn’t been altered on disk
  51. SELECT cartodb_id, ST_Buffer(the_geom, 1) AS the_geom_webmercator FROM us_cities Just aliasing

    to the_geom_webmercator wont work, remember the_geom is a different projection than the_geom_webmercator
  52. SELECT cartodb_id, ST_Transform(ST_Buffer(the_geom, 1), 3857) AS the_geom_webmercator FROM us_cities ST_Transform

    turns projects any geometry from one projection to another. We want to go from WGS84 (the_geom) to Web Mercator (the_geom_webmercator)
  53. SELECT cartodb_id, ST_Transform(ST_Buffer(the_geom, 1), 3857) AS the_geom_webmercator FROM us_cities In

    CartoDB, webmercator has an ID, or a SRID to be exact, that we can use to tell ST_Transform that we want webmercator from the_geom. The SRID is 3857 3857 = Web Mercator
  54. You can update the SQL of any layer in your

    visualizations with the setSQL function ! layer.setSQL(“SELECT * FROM us_states WHERE name = ‘New Hampshire’”); In CartoDB.js http://bit.ly/1t8vK1i
  55. Once you’ve got the things you want to render, you

    will define a set of rules in CartoCSS to define how you render it
  56. #table{ marker-fill-opacity: 0.9; marker-line-color: #FFF; marker-line-width: 1.5; marker-line-opacity: 1; marker-placement:

    point; marker-type: ellipse; marker-width: 10; marker-fill: #FF6600; marker-allow-overlap: true; }
  57. #table{ marker-fill-opacity: 0.9; marker-line-color: #FFF; marker-line-width: 1.5; marker-line-opacity: 1; marker-placement:

    point; marker-type: ellipse; marker-width: 22; marker-fill: #FF6600; marker-allow-overlap: true; }
  58. #table{ marker-fill-opacity: 0.9; marker-line-color: yellow; marker-line-width: 3.5; marker-line-opacity: 1; marker-placement:

    point; marker-type: ellipse; marker-width: 22; marker-fill: #FF6600; marker-allow-overlap: true; }
  59. #table{ marker-fill-opacity: 0.9; marker-line-color: yellow; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [atype = 'Specialized']{ marker-fill: white; } } ‘atype’ is a string column in my table
  60. #table{ marker-fill-opacity: 0.9; marker-line-color: yellow; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [atype = 'Specialized']{ marker-fill: white; } [atype = 'Institutional']{ marker-fill: green; } } ordered
  61. #table{ marker-fill-opacity: 0.9; marker-line-color: blue; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [atype = 'Institutional']{ marker-fill: green; [cartodb_id<12594]{ marker-line-color: red; } } } nested
  62. #table{ marker-fill-opacity: 0.9; marker-line-color: blue; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [is_valid = True]{ marker-fill: green; } } Boolean columns a slightly different in notation
  63. #table{ marker-fill-opacity: 0.9; marker-line-color: blue; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [zoom > 7]{ marker-line-width: 3; } } The zoom parameter
  64. #table{ marker-fill-opacity: 0.9; marker-line-color: blue; marker-line-width: 1.5; marker-line-opacity: 1; marker-width:

    22; marker-fill: #FF6600; [zoom > 7]{ marker-line-width: 3; [zoom > 8]{ marker-line-width: 6; } } } nested
  65. Map { -torque-frame-count:512; -torque-animation-duration:30; -torque-time- attribute:"cartodb_id"; -torque-aggregation- function:"count(cartodb_id)"; -torque-resolution:2; -torque-data-

    aggregation:linear; } frame-count is how many temporal bins your data will be broken into. if you are visualizing 31 days, you wouldn’t need more than 31…
  66. Map { -torque-frame-count:512; -torque-animation-duration:30; -torque-time- attribute:"cartodb_id"; -torque-aggregation- function:"count(cartodb_id)"; -torque-resolution:2; -torque-data-

    aggregation:linear; } time-attribute is the numerical or temporal column in your dataset that you want to order the animation by, beginning to end
  67. Map { -torque-frame-count:512; -torque-animation-duration:30; -torque-time- attribute:"cartodb_id"; -torque-aggregation- function:"count(cartodb_id)"; -torque-resolution:2; -torque-data-

    aggregation:linear; } the aggregation- function is actually SQL! It is how you want data in the same spatial cell, in the same temporal bin, to be combined. here, you just get a raw value
  68. Map { -torque-frame-count:512; -torque-animation-duration:30; -torque-time- attribute:"cartodb_id"; -torque-aggregation- function:"sum(population)"; -torque-resolution:2; -torque-data-

    aggregation:linear; } But here, you would get a totally different value. The result of this is an attribute called value that we can use to style with in our CartoCSS later
  69. #table{ comp-op: lighter; marker-fill-opacity: 0.9; marker-line-color: #FFF; marker-line-width: 1.5; marker-line-opacity:

    1; marker-type: ellipse; marker-width: 6; marker-fill: #FF9900; } Much of the CartoCSS for the rest looks the same as we saw before.
  70. #table{ comp-op: lighter; marker-fill-opacity: 0.9; marker-line-color: #FFF; marker-line-width: 1.5; marker-line-opacity:

    1; marker-type: ellipse; marker-width: 6; marker-fill: #FF9900; [value > 3]{ marker-fill-color: blue; } } We can use the value attribute from our aggregation function here to change styles though
  71. #table[frame-offset=1] { marker-width:8; marker-fill-opacity:0.45; } We can also use frame-offsets

    ! frame-offsets allow you to draw the marker a second (or more times) after the frame it initially hits your maps.
  72. You can update the CartoCSS of any layer in your

    visualizations with the setCartoCSS function (no line breaks) ! layer.setCartoCSS(“#tablename {marker- fill: red; }”); In CartoDB.js http://bit.ly/1vpF1pi
  73. I want you to show me the most far out

    way you can use CartoCSS and/or SQL to create a visualization. It can be a multistep process (UPDATEs, Table from Query), but be sure to document and share every step of your process. Assignment 1
  74. Time to document your lie. Begin creating a comprehensive map

    and story of your lie. This should be a narrative story. During the narrative, you should highlight where your lie is exposed through maps. Assignment 2
  75. Create a map to accompany a current news article or

    a current event. This should be an original map, not some improvement of a map included in the article Assignment 3