Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How I Got Here - NLP, Geospatial, and beyond

How I Got Here - NLP, Geospatial, and beyond

Nara Institute of Science and Technology
Watanabe Laboratory - NAIST NLP
https://nlp.naist.jp/

Avatar for Sorami Hisamoto

Sorami Hisamoto

May 16, 2025
Tweet

More Decks by Sorami Hisamoto

Other Decks in Technology

Transcript

  1. How I Got Here SCONJ PRON VERB ADV NLP, Geospatial,

    and beyond advmod nsubj advmod @sorami 2025-05-16 Nara Institute of Science and Technology
  2. Hello NAIST NLP! ¥HISAMOTO Sorami ٱຊۭւ ¥NAIST, 2012-2014 Matsumoto Lab

    Predecessor of Watanabe Lab ¥MIERUNE Inc. Software Engineer Web, Geospatial, Visualization ¥Lives in Hokkaido
  3. Hokkaido ๺ւಓ 21st biggest island in the world 83,424 km²

    22% of Japan 5.3 million people 4.2% of Japan
  4. ¥HISAMOTO Sorami ٱຊۭւ ¥NAIST, 2012-2014 Matsumoto Lab Predecessor of Watanabe

    Lab ¥MIERUNE Inc. Software Engineer Web, Geospatial, Visualization ¥Lives in Hokkaido
  5. ¥SHIROMIZU Sorami നਫۭւ ¥NAIST, 2012-2014 Matsumoto Lab Predecessor of Watanabe

    Lab ¥MIERUNE Inc. Software Engineer Web, Geospatial, Visualization ¥Lives in Hokkaido
  6. ¥SHIROMIZU Sorami നਫۭւ ¥NAIST, 2012-2014 Matsumoto Lab Predecessor of Watanabe

    Lab ¥MIERUNE Inc. Software Engineer Web, Geospatial, Visualization ¥Lives in Hokkaido → To Bangkok, Thailand (from next week É!)
  7. ¥ 2014-2016 BrainPad Data Scientist ¥ 2017-2020 Works Applications R&D

    Engineer Tokushima Laboratory of AI and NLP ¥ 2018-2019 Johns Hopkins University Researcher Center for Language and Speech Processing ¥ 2020-2022 Legalscape Software Engineer ¥ 2022-2025 MIERUNE
  8. TodayÕs Agenda 17 ¥NLP & I ¥NLP → Geospatial /

    Visualization ¥Introduction to Geospatial ¥Geo x NLP
  9. 33 ¥2012-2014 NAIST Parsing & Word Representations ¥2017-2020 WAP Tokushima

    NLP Sudachi & chiVe - tokenizer & embedding ¥2018-2019 Johns Hopkins University Machine Translation & Privacy ¥2020-2022 Legalscape LegalTech - Law & NLP
  10. Parsing & Word Representations NAIST, 2012-2014 35 Poster at the

    13th International Conference on Parsing Technologies (IWPT 2013)
  11. Sudachi & chiVe - tokenizer & embedding WAP Tokushima NLP,

    2017-2020 37 https:@Aspeakerdeck.com/sorami/sudachi-elasticsearch
  12. Sudachi & chiVe - tokenizer & embedding WAP Tokushima NLP,

    2017-2020 38 https:@Aspeakerdeck.com/sorami/sudachi-elasticsearch
  13. Sudachi & chiVe - tokenizer & embedding WAP Tokushima NLP,

    2017-2020 39 https:@Azenn.dev/sorami/articles/c9a506000fd1fbd1cf98
  14. Sudachi & chiVe - tokenizer & embedding WAP Tokushima NLP,

    2017-2020 40 https:@Aspeakerdeck.com/sorami/chive-zhi-pin-li-yong-ke-neng-nari-ben-yu-dan-yu-bekutoruzi-yuan-falseshi-xian-hexiang-kete
  15. Sudachi & chiVe - tokenizer & embedding WAP Tokushima NLP,

    2017-2020 41 ୈ16ճςΩετΞφϦςΟΫεɾγϯϙδ΢Ϝ (2020)
 ݴޠཧղͱίϛϡχέʔγϣϯݚڀձ ༏लݚڀ৆(Best Paper Award) LREC 2018
  16. LegalTech - Law & NLP Legalscape, 2020-2022 46 ݴޠॲཧֶձୈճ೥࣍େձ (NLP2022)

    Verification of machine-processed pseudonymization of legal precedents toward appropriate open data in civil cases https:@Anote.com/legalscape/n/nf6341940deaa
  17. 53 https:@Ads.yahoo.co.jp/ Flow of People Yahoo! JAPAN MineCraft Sapporo, Takamatsu,

    etc. AIST 3DDB Client AIST ࢈૯ݚ www.digiarc.aist.go.jp/team/gsvrt/information/aist-3ddb-client/ https:@Atakamatsu-mymachi.jp/ Project PLATEAU MLIT ࠃ౔ަ௨ল https:@Agithub.com/Project-PLATEAU/mapengine-survey
  18. ÒBirth of SapporoÓ Special Website Collaboration with a historical novel

    57 ໳Ҫܚتʰࡳຈ஀ੜʱʢՏग़ॻ๪৽ࣾ, 2025ʣ https:@Asapporo-tanjo.rekichizu.jp/
  19. ÒBirth of SapporoÓ special website Collaboration with a historical novel

    58 ໳Ҫܚتʰࡳຈ஀ੜʱʢՏग़ॻ๪৽ࣾ, 2025ʣ https:@Asapporo-tanjo.rekichizu.jp/
  20. Finding ancient tombs With geospatial data and machine learning models

    59 https:@Awww.nikkei.com/article/DGXZQOUF134BP0T11C23A2000000/ https:@Awww.fujitv.co.jp/sekainonandakore/archive/20250423.html
  21. Commercial Company, Born from OSS Community Founded by the committee

    members of ÒFOSS4G HokkaidoÓ 60 Image: https:@Atwitter.com/howmori/status/1150064830851645440
  22. Financial & Operational Contributions If you want to go far,

    go together 64 QGIS Desktop geospatial software €32,000 2017~ + over €6,600 w/ certificate program In 2025, Medium→Large: €9,000/year MapLibre Web map library project $40,000 2022~ ※The first sponsor in the world Others - AWS, Meta, Microsoft, etc.
  23. Financial & Operational Contributions If you want to go far,

    go together 65 ¥QGIS Country User Group, Voting Member ¥MapLibre, Voting Member ¥OSGeo Foundation, Charter Member ¥OSGeo.JP, Company Member ¥etc., ...
  24. Internship If you are interested, come talk to us! 66

    https:@Anote.com/mierune/m/m0352153f87dd
  25. A People Map of Japan City names are replaced by

    their most WikipediaÕed person 69 https://sorami.dev/2021/people-map-japan/
  26. Japanese from Hokkaido to Okinawa Population of the cities along

    latitude (north-south) 70 https://observablehq.com/@sorami/japanese-from-hokkaido-to-okinawa
  27. The ÒRoundestÓ Lake in Japan Uses geospatial techniques and gradient

    descent 73 https://zenn.dev/mierune/articles/9f970dc3e61a66
  28. 79 https:@Adailyportalz.jp/kiji/reading-dictionary Ҵ઒ɿ࣮͸ɺ஍ਤ΋ ɹɹʮݱ࣮ͷ΋ͷΛͳΔ͚ͩਖ਼֬ʹॻ͖ࣸͯ͠ɺ ͲͷΑ͏ʹ·ͱΊͯදݱ͢Δͷ͔ʯ ͱ͍͏ͱ͜Ζ͸ɺࣙయͱڞ௨͢Δͱ͜Ζ͕͋ΔΜͰ͢ΑͶɻ Inagawa: Actually, maps have

    something in common with dictionaries in that they involve copying real things as accurately as possible and then figuring out how to summarize and express them.
  29. Wall paintings in the Lascaux Caves Estimated to be 17,300

    years old 82 Image: Prof saxx / Wikimedia
  30. 85 ࣌୅Λ͞ΒʹਐΊΔͱɺࢹ֮Խͷ࣍ͷൃలظ͸ɺ ࠓͰ͸ਤදͱ͔ॳظͷ஍ਤͱݺ͹Ε͍ͯΔ΋ͷ ÑֆͰදݱ͞Ε͍ͯͳ͕Β΋ந৅తͳ৘ใΛද͍ͯ͠ΔÑ Ͱىͬͨ͜ɻ “ ” Moving further in

    time, the next development in visualization occurred with what we now call charts and early maps Ñpictorial yet abstract representations of information. p.27
  31. 86 ¥Both ÒlanguageÓ and ÒmapÓ are Tools for Thought ¥Human

    cognition is limited by the structure of the human body ¥Still, tools and concepts allow us to think the unthinkable
  32. 3 years ago (around 2021) While doing NLP, visualization as

    a hobby... 88 My skectes while I was learning Amelia Wattenberger’s “Fullstack D3 and Data Visualization” course
  33. John Snow's Cholera Map 1854 A pioneer in ÒData VisualizationÓ

    90 en.wikipedia.org/wiki/File:Snow-cholera-map-1.jpg (public domain) ¥ Cholera outbreak in 19th century London ¥ The mechanism of infection was unclear based on the scientific knowledge at the time ¥ Dr. Snow plotted the patient's homes on a map and discovered a correlation between the number of infected people around a certain well
  34. Then, I joined a new online community ÒLearning Data Visualization

    with the CommunityÓ 91 https:@Adata-viz-manabiba.visualizing.jp
  35. In that community É 92 FURUKAWA Yasuto MIERUNE co-founder We

    run a geospatial & visualization company called MIERUNE in Hokkaido
  36. Then, I went to give a talk at their event

    Without knowing anyone, I jumped in and signed up to be a speaker 93 https:@Aspeakerdeck.com/sorami/mierune-meetup-mini-number-01
  37. Then, I went to give a talk at their event

    Without knowing anyone, I jumped in and signed up to be a speaker 94 https:@Aspeakerdeck.com/sorami/mierune-meetup-mini-number-01 This is how I joined MIERUNE and moved to Hokkaido !!!
  38. By coincidence, at the same time É I joined NAIST

    in 2012, Ouchi-san joined in 2013 95 https:@Ax.com/blankeyelephant/status/1498197522548150275 Awarded a KAKENHI B grant for our research on computational understanding of text and grounding it on real-world maps É “ ”
  39. WAP Tokushima NLP, 2017- 99 https:@Awww.nikkei.com/article/DGXLZO12701880Y7A200C1TI5000/ ¥ Lab opened in

    February 2017 ¥ Prof. Matsumoto as technical advisor ¥ At the time, I had just quit my job without deciding on a next one ¥ He introduced me to this new lab, then I joined in June 2017
  40. Johns Hopkins University, 2018- 100 ¥ Kevin Duh was my

    supervisor at Matsumoto Lab (2012-2014) ¥ He then moved to JHU ¥ He saw our work on Sudachi and Elasticsearch (full-text search) ¥ He was doing a project involving Information Retrieval; Then he invited me to JHU https:@Awww.cs.jhu.edu/~kevinduh/
  41. Legalscape, 2020- 101 I got to know this person via

    this Pull Request https:@Agithub.com/WorksApplications/elasticsearch-sudachi/pull/60
  42. What is ÒGeospatialÓ Data? ஍ཧۭؒ৘ใ All kinds of data on

    (and under) the Earth (and other planets) 107
  43. What is ÒGeospatialÓ Data? ஍ཧۭؒ৘ใɾҐஔ৘ใ All kinds of data on

    (and under) the Earth (and other planets) 108 Ò80% of Data is GeographicÓ
  44. What is ÒGeospatialÓ Data? All kinds of data on (and

    under) the Earth (and other planets) 109 Location Point, Line, Polygon Properties + Name, Year, ID, É (Can be anything)
  45. What is ÒGeospatialÓ Data? All kinds of data on (and

    under) the Earth (and other planets) 110 https:@Amaps.gsi.go.jp/ Satellite Imagery and Aerial Photos are also Geospatial Data (raster data)
  46. Data: Mapillary 2013~ Street Images (like Google Street View) -

    acquired by Meta in 2020 112 https:@Awww.mapillary.com/
  47. Data: Overture Maps Foundation 2022~ Company alliance, founded by AWS,

    Meta, Microsoft, & TomTom 113 https:@Aoverturemaps.org/ ¥ Addresses ¥ Base water, land, land use, infrastructure, land cover ¥ Buildings ¥ Divisions ¥ Places ¥ Transportation
  48. Project PLATEAU 2020~ Create and utilize 3D city models of

    Japanese cities Led by the government (MLIT: Ministry of Land, Infrastructure, Transport & Tourism - ࠃ౔ަ௨ল) 114 https:@Aplateauview.mlit.go.jp/
  49. GIS

  50. What is ÒGISÓ? 117 Geographic Information System A system to

    view, create, edit, or analyze geospatial data ※May refer to ÒGeographic Information ScienceÓ instead
  51. 118 Desktop GIS WebGIS ¥ Software to be installed on

    a PC ¥ Advanced analysis, etc. possible ¥ Via a web brower ¥ No need to install, easy to view
  52. 120

  53. QGIS LAB by MIERUNE Lots of articles on various topics

    (in Japanese) 122 https:@Aqgis.mierune.co.jp/
  54. The difficulty of geospatial data The Earth is not flat

    126 Image: @vlandham, https:@Agist.githubusercontent.com/vlandham/raw/9216751/ https:@Ax.com/mourner/status/1458169016456032260/
  55. The difficulty of geospatial data ÒMap Projection for the Interactive

    MediaÓ (in Japanese) 127 https:@Aspeakerdeck.com/sorami/mierune-she-nei-mian-qiang-hui-number-033
  56. Example: Missile from North Korea 128 Fig. on a Newspaper

    Fixed Version ਤ: Esri, ArcUser Online, “Understanding Geodesic Buffering” Figure 1, Figure 2 https:@Awww.esri.com/news/arcuser/0111/geodesic.html
  57. Recommended Books (in Japanese) Written by CTO of MIERUNE 129

    ʰݱ৔ͷϓϩ͕Θ͔Γ΍͘͢ڭ͑ΔҐஔ৘ใΤϯδχΞཆ੒ߨ࠲ʱ Ҫޱ૗େ (ल࿨γεςϜ, 2023) ʰݱ৔ͷϓϩ͕Θ͔Γ΍͘͢ڭ͑ΔҐஔ৘ใσϕϩούʔཆ੒ߨ࠲ʱ Ҫޱ૗େ (ल࿨γεςϜ, 2024)
  58. Hiroki Ouchi ÒGeospatial in the TextÓ Toward the Integration of

    GIS and NLP 132 https:@Aspeakerdeck.com/hiroki13/wen-zhang-nonakanodi-li-kong-jian-di-li-kong-jian-qing-bao-ke-xue-gis-tozi-ran-yan-yu-chu- li-nlp-norong-he-hexiang-kete-c41f1d81-1397-4df0-888d-370f0583597e
  59. Example: Geocoding Link addresses, facility names, etc. to location 133

    https:@Aspeakerdeck.com/sorami/nlp2023 ݴޠॲཧֶձୈճ೥࣍େձ (NLP2023)
  60. Example: Geocoding Link addresses, facility names, etc. to location 134

    Named Entity Recognition Geotagging (Toponym Extraction) Entity Resolution Geocoding (Toponym Resolution) Entity Linking Geoparsing + = = + Retrieval Resolution Retrieval + Resolution General Geospatial
  61. Words from Kevin Duh that still remains in my heart

    139 ¥Always Ask Questions ¥Just Do It ¥Have Fun My supervisor at NAIST Matsumoto Lab.
  62. How I Got Here SCONJ PRON VERB ADV NLP, Geospatial,

    and beyond advmod nsubj advmod @sorami 2025-05-16 Nara Institute of Science and Technology