Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building Inspector - Shape + Address consensus
Search
Mauricio Giraldo
October 21, 2014
Technology
240
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Building Inspector - Shape + Address consensus
Mauricio Giraldo
October 21, 2014
More Decks by Mauricio Giraldo
See All by Mauricio Giraldo
Aereo: An experimental bird’s eye view of the digital collections from the State Library of New South Wales
mgiraldo
0
380
From food to buildings and beyond: what happens when a library opens its digital collections to human-computer collaboration
mgiraldo
2
220
Aprendizajes de trabajo en bibliotecas digitales
mgiraldo
0
180
building inspector
mgiraldo
0
110
Talk at the NYU ITP Data Art class / Spring 2017
mgiraldo
0
200
Humanidades Digitales en los laboratorios de la Biblioteca Pública de New York
mgiraldo
0
120
FOSS4G Nara/Tokyo
mgiraldo
0
2.1k
Human-Computer Collaboration at NYPL Labs
mgiraldo
2
500
NYPL Labs @ Eyeo Festival 2015
mgiraldo
1
790
Other Decks in Technology
See All in Technology
AIAU_UMEMOGU_ninomiya_slide
ninomiya_ii
0
260
千葉での単身赴任からAWSをやり続け、千葉に戻ってきた話
yama3133
1
120
5分でわかるDuckDB Quack
chanyou0311
3
250
Kiro Ambassador を目指す話
k_adachi_01
0
130
AI時代に求められる技術力 フロンティア・クリエイティビティ / Technical Excellence in the AI Era: Frontier Creativity
kaonavi
0
110
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
200
感情と身体を置き去りにしない、エンジニアの生きのこり方 ──いまから、ここから「自分の状態」を扱うという選択
saorimurooka
0
340
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
10
1.6k
「勝手に広まる」人気 AI エージェントを爆速で作ろう!(AWS Summit Japan 2026講演資料)
minorun365
PRO
10
2.6k
飲食店もAIで。レジ締めやハンディシステムをつくってる話 / Using AI for restaurant management
vtryo
0
190
元・セキュリティ学習経験0大学生による業務紹介 / An Introduction to the Job by a Former College Student with Zero Security Training Experience
nttcom
0
250
スタートアップにAmazon EKSは早すぎる? マルチプロダクト戦略を加速する Platform Engineeringの実践 / Is Amazon EKS Too Soon for Startups? Practical Platform Engineering to Accelerate a Multi-Product Strategy
elmodev09
1
1.8k
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.4k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
300
Music & Morning Musume
bryan
47
7.2k
Crafting Experiences
bethany
1
190
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
860
Amusing Abliteration
ianozsvald
1
210
WCS-LA-2024
lcolladotor
0
650
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
620
Information Architects: The Missing Link in Design Systems
soysaucechin
0
980
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
200
Transcript
mauricio giraldo arteaga @mgiraldo NYPL Labs
None
bon jour
my name is mauricio
None
research and circulating library system spanning the Bronx, Staten Island
and Manhattan boroughs in NYC
None
NYPL Labs
None
i’m going to talk about maps
The Great Map Data Extraction
an adventure in three acts and a prologue and an
epilogue
prologue
The Lionel Pincus and Princess Firyal Map Division
None
None
None
None
None
None
500,000+ maps 20,000+ books & atlases
None
None
None
None
None
year
street names year
use type street names year
use type street names name year
material use type street names name year
material use type street names name class year
material use type street names address name class year
material use type street names address floors name class year
material use type street names address floors name class year
skylights
material use type street names address floors name class year
skylights backyards
material use type street names address floors name class geo
location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
footprint material use type street names address floors name class
geo location year skylights backyards
we got these for several decades since the 1800s and
by 1950 every town in the US with a population of 2,000 had been mapped
data trapped in a legacy format
we want all the data!
f**k yeah historical data!
citysdk.waag.org/buildings
citysdk.waag.org/buildings
NYU Stern / Imaginaria3D
NYU Stern / Imaginaria3D
maps.google.com
maps.google.com
None
data
it all starts with a photograph
None
but it is “just a photo” but it is only
a few clicks away
None
maps.nypl.org/warper
None
None
geo-rectification or: “make it match Open Street Map”
None
None
*this is a simulation. actual process is intensive. consult your
mathematician before trying
None
None
vectorization or: “draw the building shapes”
None
results from maps.nypl.org/warper
hand-crafted, artisanal, locally-sourced data
500,000+ maps 20,000+ books & atlases
500,000+ maps 20,000+ books & atlases* *imagine how many pages
an atlas has
in the order of dozens of millions building footprints if
counting NYC only
None
~120k footprints produced in three years by staff and volunteers
None
this will take us a few millenia* *actual number taken
out from a hat
there has to be a better way
act i: will there be polygons?
requests to geo companies went unanswered
None
can we automate this?
None
¿¡quoi!? @mgiraldo
None
None
None
None
what is a building?
None
completely enclosed by black lines
completely enclosed by black lines dashed lines are not walls
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2)
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
process
github.com/NYPL/map-vectorizer
None
None
None
None
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
provide the best (possible) input image
None
None
None
None
differences in resampling cubic nearest neighbor
differences in resampling cubic nearest neighbor
make the image a binary bitmap or: “black and white”
None
None
polygonize or: “convert contiguous pixels to a single line shape”
None
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
! gdal_polygonize.py test.tif -f "ESRI Shapefile" test.shp test
None
no no no no no
no no no no no yes yes
simplify* *for those polygons that we care about
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔
None
None
alpha shape *code basically stolen wholesale from rpubs.com/geospacedman/alphasimple
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
we need a set of points
None
pts = spsample(polygon, n=1000, type="hexagonal")
pts = spsample(polygon, n=1000, type="regular")
pts = spsample(polygon, n=1000, type="random")
now we alpha shaping
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
x.as = ashape(pts@coords, alpha=2.0)
there are other point reduction algorithms like Ramer-Douglas-Peucker or Whyatt
Curve Simplification
separate the buildings from the chaff
completely enclosed by black lines dashed lines are not walls
> 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored ✔ ✔ ✔ ✔
None
None
[218, 211, 209]
[218, 211, 209] paper [199, 179, 173], [179, 155, 157],
[206, 193, 189], [199, 195, 163], [207, 204, 179], [195, 189, 154], [209, 203, 181], [255, 225, 40], [194, 198, 192], [161, 175, 190], [137, 174, 163], [166, 176, 172], [149, 156, 141] [205, 200, 186] not paper
None
None
None
this is good enough for our use case
None
None
None
✔ ✔ ✔ ✔ ✔ completely enclosed by black lines
dashed lines are not walls > 20m2 (~180ft2) < 3,000m2 (~27,000ft2) not paper-colored
computer-vision for attribute recognition *bonus quest
None
None
None
66,056 footprints produced in one day for an 1859 atlas
of Manhattan
caveats: ! adjacency not enforced false positives/negatives buildings may also
overlap
act ii: the vectorizer needs to prove itself
None
None
None
None
multiple inspections for each item and let consensus surface on
its own
footprint validation or: “tell us what the computer got right
or wrong“
are people willing to spend time checking building footprints? insurance
atlases are not exactly the coolest type of maps
None
buildinginspector.nypl.org
github.com/NYPL/building-inspector
None
None
None
None
about a month later…
None
None
None
None
420k+ flags* 70k+ unique polygons ! consensus: ~84% YES, 7%
FIX, 9% NO *a “flag” is a YES/NO/FIX by one person for a given polygon
seems people are willing after all… we — our contributors
seems people are willing after all… we — our contributors
act iii: the return of the inspector
footprint material use type street names address floors name class
geo location year skylights backyards
divide and conquer
footprint material use type street names address floors name class
geo location year skylights backyards
three new tasks for now… we really want it all!
None
footprint material use type street names address floors name class
geo location year skylights backyards
check
check YES
check YES address color
check YES FIX address color
check YES FIX address color fix
check YES FIX address color fix
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
check YES FIX address color fix *footprints marked as “NO”
go to building heaven
fix
fix
address
address
classify color
classify color
865k+ flags
check YES FIX address color fix
check YES FIX address color fix for 80k+ unique polygons
77k+ 5k+ 42k+ 26k+
epilogue
address and shape consensus or: how to determine what the
right building footprint and address looks like?
None
None
all points are useful inclusiveness above all
None
None
None
None
None
None
None
None
DBSCAN for the win citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980
bit.ly/nypl-consensus
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 246 246 414 246 414 414 246 414 414
414 ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ + +
246 414 + +
None
None
DBSCAN for shapes also!
None
None
None
None
None
None
all points are still useful
None
﹡
﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡ ﹡
﹡ ﹡ ﹡ ﹡ ﹡
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
+ + + + + + +
None
None
None
None
None
None
None
resulting data available via an API
resulting data available via an API in 100% recyclable GeoJSON
None
photographing
photographing ↓
photographing ↓ geo-rectification
photographing ↓ geo-rectification ↓
photographing ↓ geo-rectification ↓ vectorization
photographing ↓ geo-rectification ↓ vectorization ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓
photographing ↓ geo-rectification ↓ vectorization ↓ inspection ↓ check /
fix / color / address ↓ consensus ↓ data release
not the end
None
None
None
¡merci beaucoup! mauricio giraldo arteaga @mgiraldo NYPL Labs slides at:
bit.ly/nypl-ehess images from: NYPL digital collections - Wikimedia Commons Christopher Cannon - Flickr user wallyg - Giphy