Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building Inspector - Shape + Address consensus
Search
Mauricio Giraldo
October 21, 2014
Technology
0
210
Building Inspector - Shape + Address consensus
Mauricio Giraldo
October 21, 2014
Tweet
Share
More Decks by Mauricio Giraldo
See All by Mauricio Giraldo
Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales
mgiraldo
0
350
From food to buildings and beyond: what happens when a library opens its digital collections to human-computer collaboration
mgiraldo
2
180
Aprendizajes de trabajo en bibliotecas digitales
mgiraldo
0
150
building inspector
mgiraldo
0
94
Talk at the NYU ITP Data Art class / Spring 2017
mgiraldo
0
160
Humanidades Digitales en los laboratorios de la Biblioteca Pública de New York
mgiraldo
0
97
FOSS4G Nara/Tokyo
mgiraldo
0
1.9k
Human-Computer Collaboration at NYPL Labs
mgiraldo
2
460
NYPL Labs @ Eyeo Festival 2015
mgiraldo
1
720
Other Decks in Technology
See All in Technology
PHPからはじめるコンピュータアーキテクチャ / From Scripts to Silicon: A Journey Through the Layers of Computing
tomzoh
2
250
american aa airlines®️ USA Contact Numbers: Complete 2025 Support Guide
aaguide
0
500
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
39k
ソフトウェアテストのAI活用_ver1.25
fumisuke
1
620
Microsoft Defender XDRで疲弊しないためのインシデント対応
sophiakunii
2
330
LIXIL基幹システム刷新に立ち向かう技術的アプローチについて
tsukuha
1
400
cdk initで生成されるあのファイル達は何なのか/cdk-init-generated-files
tomoki10
1
690
How to Quickly Call American Airlines®️ U.S. Customer Care : Full Guide
flyaahelpguide
0
240
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
6.9k
振り返りTransit Gateway ~VPCをいい感じでつなげるために~
masakiokuda
4
210
セキュアなAI活用のためのLiteLLMの可能性
tk3fftk
1
370
「現場で活躍するAIエージェント」を実現するチームと開発プロセス
tkikuchi1002
4
570
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Adopting Sorbet at Scale
ufuk
77
9.5k
Designing for Performance
lara
610
69k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Docker and Python
trallard
45
3.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
5.9k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Side Projects
sachag
455
42k
Transcript
mauricio giraldo arteaga @mgiraldo NYPL Labs
None
bon jour
my name is mauricio
None
research and circulating library system spanning the Bronx, Staten Island
and Manhattan boroughs in NYC
None
NYPL Labs
None
i’m going to talk about maps
The Great Map Data Extraction
an adventure in three acts and a prologue and an
epilogue
prologue
The Lionel Pincus and Princess Firyal Map Division
None
None
None
None
None
None
500,000+ maps 20,000+ books & atlases
None
None
None
None
None
year
street names year
use type street names year
use type street names name year
material use type street names name year
material use type street names name class year
material use type street names address name class year
material use type street names address floors name class year
material use type street names address floors name class year
skylights
material use type street names address floors name class year
skylights backyards
material use type street names address floors name class geo
location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
we got these for several decades since the 1800s and
by 1950 every town in the US with a population of 2,000 had been mapped
data trapped in a legacy format
we want all the data!
f**k yeah historical data!
citysdk.waag.org/buildings
citysdk.waag.org/buildings
NYU Stern / Imaginaria3D
NYU Stern / Imaginaria3D
maps.google.com
maps.google.com
None
data
it all starts with a photograph
None
but it is “just a photo” but it is only
a few clicks away
None
maps.nypl.org/warper
None
None
geo-rectification or: “make it match Open Street Map”
None
None
*this is a simulation. actual process is intensive. consult your
mathematician before trying
None
None
vectorization or: “draw the building shapes”
None
results from maps.nypl.org/warper
hand-crafted, artisanal, locally-sourced data
500,000+ maps 20,000+ books & atlases
500,000+ maps 20,000+ books & atlases* *imagine how many pages
an atlas has
in the order of dozens of millions building footprints if
counting NYC only
None
~120k footprints produced in three years by staff and volunteers
None
this will take us a few millenia* *actual number taken
out from a hat
there has to be a better way
act i: will there be polygons?
requests to geo companies went unanswered
None
can we automate this?
None
¿¡quoi!? @mgiraldo
None
None
None
None
what is a building?
None
completely enclosed by black lines
completely enclosed by black lines dashed lines are not walls
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
process
github.com/NYPL/map-vectorizer
None
None
None
None
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
provide the best (possible) input image
None
None
None
None
differences in resampling cubic nearest neighbor
differences in resampling cubic nearest neighbor
make the image a binary bitmap or: “black and white”
None
None
polygonize or: “convert contiguous pixels to a single line shape”
None
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
None
no no no no no
no no no no no yes yes
simplify* *for those polygons that we care about
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
None
None
alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
we need a set of points
None
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="random")
now we alpha shaping
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt
Curve Simplification
separate the buildings from the chaff
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
None
None
[218, 211, 209]
[218, 211, 209] paper [199, 179, 173], [179, 155, 157],
[206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
None
None
None
this is good enough for our use case
None
None
None
✔ ✔ ✔ ✔ ✔ completely enclosed by black lines
dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
computer-vision for attribute recognition *bonus quest
None
None
None
66,056 footprints produced in one day for an 1859 atlas
of Manhattan
caveats: ! adjacency not enforced false positives/negatives buildings may also
overlap
act ii: the vectorizer needs to prove itself
None
None
None
None
multiple inspections for each item and let consensus surface on
its own
footprint validation or: “tell us what the computer got right
or wrong“
are people willing to spend time checking building footprints? insurance
atlases are not exactly the coolest type of maps
None
buildinginspector.nypl.org
github.com/NYPL/building-inspector
None
None
None
None
about a month later…
None
None
None
None
420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%
FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
seems people are willing after all… we — our contributors
seems people are willing after all… we — our contributors
act iii: the return of the inspector
footprint material use type street names address floors name class
geo location year skylights backyards
divide and conquer
footprint material use type street names address floors name class
geo location year skylights backyards
three new tasks for now… we really want it all!
None
footprint material use type street names address floors name class
geo location year skylights backyards
check
check YES
check YES address color
check YES FIX address color
check YES FIX address color fix
check YES FIX address color fix
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
fix
fix
address
address
classify color
classify color
865k+ flags
check YES FIX address color fix
check YES FIX address color fix for 80k+ unique polygons
77k+ 5k+ 42k+ 26k+
epilogue
address and shape consensus or: how to determine what the
right building footprint and address looks like?
None
None
all points are useful inclusiveness above all
None
None
None
None
None
None
None
None
DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980
bit.ly/nypl-consensus
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 414 + +
None
None
DBSCAN for shapes also!
None
None
None
None
None
None
all points are still useful
None
﹡
﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
None
None
None
None
None
None
None
resulting data available via an API
resulting data available via an API in 100% recyclable GeoJSON
None
photographing
photographing ↓
photographing ↓ geo-rectification
photographing ↓ geo-rectification ↓
photographing ↓ geo-rectification ↓ vectorization
photographing ↓ geo-rectification ↓ vectorization ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓ data release
not the end
None
None
None
¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:
bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons Christopher Cannon - Flickr user wallyg - Giphy