Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing(2019-11-18)

Natural Language Processing(2019-11-18)

Natural Language Processing and Data Quality.
Seminar at Berlin.

Kenji Hiramoto

November 18, 2019
Tweet

More Decks by Kenji Hiramoto

Other Decks in Technology

Transcript

  1. Natural Language Processing(NLP) and data quality Introduction and use cases

    from Japan 2019-11-18 Kenji Hiramoto Chief Strategist (IT), Cabinet Secretariat
  2. AI will speak English smoother than I do. I believe

    communication doesn't depend on only language and text information. 1
  3. Kenji Hiramoto Chief Strategist(IT) • Working as a Chief Data

    Officer Projects • Digital Government Strategy • Government Interoperability Framework • Base Registries project • Data Exchange Platform • Data Analysis Projects Fields • Government • Smart city • Disaster Risk Management 2
  4. Background Effective and Valuable services for citizens Efficiency of Public

    administrations Public Safety Infrastructures 4 Various service requests from citizens 24h365d services Personalized services Population 127,000thousands Decrease in government staffs Central government 341 thousands staffs(2018) Local governments 3,384 thousands staffs (2017) Compliance to complex legislations and rules Prevent the crimes and accidents Number of Crimes 817,338 Number of Traffic accident 43,0601 (2017) Monitor and Maintain the infrastructures Roads 1,279,511.9 km Rivers 144,031.4km Number of Bridges (over 2m) 691,901 Number of Tunnels 10,619 (2017-4-1)
  5. Why do we focus on AI? 5 Accuracy Stability Fairness

    Accountability Adoption AI enhance government trust and service quality. Speed 24 X 365 service We can‘t prevent human errors and bias. X AI
  6. Overview of Digital Strategy in Japan 6 Other Acts Privacy

    act and Info. disclosure act Society 5.0 (2016) Economic Growth Strategy (2019-6) Innovation Strategy (2019-6) Digital Strategy (2019-6) Data Strategy (2019-6) Digital Government Policy (2017-5) • Connected Industries • Digital first • Once only • One stop Vision National Strategies Digital Strategies IT ACT (2000) DATA ACT (2016) Digital First ACT (2019) Society 5.0 Reference Architecture Data Exchange Platform Government Interoperability Framework (IMI) Data Strategy / Policy Rule Organization Business Data harness functions Data Data broker function Asset Sensor, Actuator, Hardware, Network Gathering, Integration, Cleansing, Device management Data definition, Data model, Code Catalogue, Search, AI, Analysis Business process Business Rule Coordination Team Legislation Regulation Vision Priority domain Security / Authentication Lifecycle Open Data Policy (2017-5) AI Principal (2019-3) AI Strategy (2019-6) AI Strategy
  7. AI Society Principal We launched the AI Society Principal in

    March 2019. AI will bring many great benefits to society, but its enormous impact on society. We need to redesign society in every way, including activities of the citizen, social systems, industrial structures, innovation systems, and governance. 7 Human-Centric Education/Literacy Privacy Protection Ensuring Security Fair Competition Fairness, Accountability, and Transparency Innovation R&D and Utilization Principles Dignity: A society that has respect for human dignity Diversity & Inclusion: A society where people with diverse backgrounds can pursue their well-being Sustainability: A sustainable society Section Philosophy AI Society Principals R&D and Utilization Principals 5 Essential element • Human Potential • Social Systems • Industrial Structures • Innovation Systems • Governance
  8. Strategic Objectives AI Strategy We launched AI Strategy in June

    2019. This strategy clarifies the actions and roles through 2030. 8 Objective1: Human resources Objective2: Competitiveness Objective3: Architecture & sustainable mechanism Objective4: Research, education and social infrastructure network Foundation for the Future Education R&D Foundation for Industry and Society Digital Government Data rerated Infrastructure • Data infrastructure • Trust / Security • Networks SMEs and Start up Support Priority area / PoC • Health, Medical Care and Long-term Care • Agriculture • National Resilience (Infrastructure, Disaster Risk Management) • Transportation Infrastructure and Logistics • Smart Cities
  9. AI Strategy(Public Service) 9 Resilience(Infrastructure) • Robotics & Sensor network

    • Infrastructure data platform • Urban planning Smart cities • Smart cities • Including technologies Digital government • High value services • EBPM • Sustainable services • Work style Request for the high- quality services Increase the aging citizens Decrease the workers Finance of the government AI is an essential technology for public administrations. Data infrastructure • Interoperability framework • Data quality
  10. Status of AI projects in public sectors 10 Infrastructure s

    Public Safety Efficiency of Public administrations Services for citizens Technologies ◎Service PoC (Care planning) ◎Service PoC (Chatbot) • Speech Recognition • Natural Language Processing (NLP) Inquiry and consultation ◎Service PoC (Maintenance of Bridges) ◎Service (Maintenance of Roads) 〇Service (Check of Documents ) • Image Recognition • Sensor Data Processing • NLP Monitor and inspection ◎Service PoC (Nurseries with Children) ◎Service PoC (Promotion to potential residents) • Pattern Matching Matching services 〇Service (Crime map) 〇PoC (Weather map) • Image recognition • Sensor Data Processing • NLLP Predictive analysis ◎Service (OCR, RPA) • Image recognition • NLP Information management ◎Service (O&M) ◎Service (Monitoring) ◎Service (Reception) • Robotics AI with Robotics
  11. AI Assignment of nurseries (Saitama city) • AI assign nurseries

    to each child in consideration of parental requests. 11 A few seconds 300 nurseries Assign by 20-30 staffs. It spend 1 week. (Total 125days) I like both X nursery and Z nursery. I like Y nursery. I dislike Y nursery. Anywhere. My child has an allergy. There are various requests from parents. 8000 children Before AI AI AI recommended the combination between children and nurseries. The results were almost the same as if staffs were to do it. Use case Data model
  12. Care Plan Assistant(Fukuoka city) • CPA engine recommend care plans

    to care managers by using AI. 12 Welfare facilities Database Accumulation of care plan data Knowledge of care specialist CPA engine Recommended care plan Matching between the care plan and welfare facilities http://welmo.co.jp/ Care manager Use case
  13. Crime prediction map(Kyoto Prefecture Police) • Crime prediction system(2016-10- )

    • 100thousants crime history data -> Kyoto police arrested over 30 cases. • Officer can check the possibility of crimes on their map. • The map indicate the focus area by color. 13 Suspicious person Attention! (Snatch) Suspicious Bike Use case
  14. AI recommendation for a migration (Itoshima city) • Itoshima city

    promote migration from large cities 14 Staffs of Itoshima city Other city’s residences AI Profile: Age Gender Family Hobby - - - Persons who is 30-50 years old focus on the following issues. Distance for school Safety Your type Transportation School Community Shopping Safety Hospital Recommended Area by AI Transportation School Community Shopping Safety Hospital Favorite Fukae area Consultation by using AI recommendation AI learn from the result of consultation. Use case
  15. Real property management(Saitama city) • On January 1, which is

    the base date for property tax assessment, we take aerial photographs. We will use AI to find houses that were new construction, expansion, and destruction. • By narrowing down the points that we investigate, we accurately understand the taxable houses. In the result, it makes taxation more efficient. 15 AI recognize and store the shape of the building. new construction Destruction 2019 2018 Result 196 0 296 0 94 63 0 100 200 300 400 500 600 700 Befor AI Aftre AI Time of Investigation (March 2019) Field Investigation Check the aerial photography Update the Base registry Saitama city will use the map for community building and disaster risk management plan. Use case
  16. My city report (Chiba city) • AI will find road

    damage, automatically • This solution is more efficient than a person can inspect it, and it is possible to grasp the state of the road in a wide range. 16 Evaluated by AI 1. No damage 2. Damaged 3. Maintenance AI Citizen’s reporting system : Chiba-repo An official car with camera The app will find road damage and send pictures to the server to take photos automatically. The AI use the deep learning technology. Check and feedback Next generation citizen’s reporting system Use case
  17. Rust on infrastructure(Muroran city) • There are many old bridges

    made of iron. • They conducted simulations to improve the efficiency of inspections. 17 Weather data - Temparature - Rain - Time of sunshain - Wind - Moisture Rust-related data (Monthly from 2013 to2014) Prediction by AI Use case
  18. Maintenance of banks • There are many rivers in Japan.

    MLIT and municipalities spend a lot of time and cost to maintain of the banks . 18 http://www.yachiyo-eng.co.jp/e/ http://www.yachiyo-eng.co.jp/topics/gogango.html Inspection by specialist 1. Once or twice a year, Specialist take photos and memos by their foot. 2. They make the panorama photo and put the memos on the p 3. They prioritize the part that need maintenance. There are some cracks in the red areas. 10000 Photo with deep learning technology AI Use case
  19. Robot car for rural area • There are many ageing

    citizens in rural area. The aging citizens can go out by robot cars. 19 Hospital Municipality service Robot car Station Robot car Control center Call from smart phone Course guide For winding road and unclear road Mountain area “Michinoeki” Community station Shop Use case
  20. Robot for infrastructure maintenance • There are many old infrastructure

    in Japan. • 150 thousands bridges that are over 15m. • 9 thousands tunnels 20 Wire inspection robot for bridges http://www.mlit.go.jp/sogoseisaku/maintenance/_gppdf/ko18.pdf Sewer inspection robot Cracks Repaired hole Spring water www.mlit.go.jp/sogoseisaku/maintenance/_gppdf/ko23.pdf Drone Car mounted Robot Hanged Robot Sucker Robot Pole type Robot Dam inspection robot http://www.mlit.go.jp/common/001125345.pdf http://www.mlit.go.jp/common/001125338.pdf Use case
  21. Robot for aging society • Japan is aging society. Nursing

    services need more support staffs. It is very hard works to support aging persons. • People aged 65 or older will make up more than 38 per cent of the population in 2065. 21 http://robotcare.jp/wp-content/uploads/2017/04/List- of-commercialized-equipment.pdf Therapy Robot ”Paro” http://www.aist.go.jp/aist_j/press_release/pr2004/pr20 040917_2/pr20040917_2.html http://rtc.nagoya.riken.jp/RIBA/ Powered suit RIBA(Robot for Interactive Body Assistance) Use case
  22. Natural language processing • Semantics • Lexical semantics • Distributional

    semantics • Machine translation • Named entity recognition (NER) • Natural language generation • Natural language understanding • Optical character recognition (OCR) • Question answering • Recognizing Textual entailment • Relationship extraction • Sentiment analysis (see also multimodal sentiment analysis) • Topic segmentation and recognition • Word sense disambiguation 23 • Syntax • Grammar induction • Morphological segmentation • Part-of-speech tagging • Parsing • Sentence breaking • Stemming • Word segmentation • Terminology extraction • Discourse • Automatic summarization • Coreference resolution • Discourse analysis • Speech • Speech recognition • Speech segmentation • Text-to-speech • Dialogue https://en.wikipedia.org/wiki/Natural_language_processing There are many technologies. (From Wikipedia, the free encyclopedia)
  23. Services of NLP 24 • Communication • Translation • Digitize

    • Inquiry • Presentation • Record of meeting • Tagging • Summarize • Analysis • Analysis of opinion • Marketing • Disinformation Efficiency Processing large amounts of data Making English-language materials Real time analysis Analysis from various view
  24. History of NLP 25 Open Government • Dialogue analysis Disaster

    • Tweets analysis Inquiry • Chatbot Contact Center • Text analysis Meeting • Minuit • diet Marketing • Economic index Trust Additional information Communication 2010 2011 2013 Translation • Text • Voice Handwriting • PC, tablet
  25. Benefit of NLP • Eliminate labor shortages • Releasing from

    monotonous work(Transcription, document digitization) • Efficiency • Improve accuracy • Preventing Inconsistencies • Organize and discover the needs • minor opinions • Monitor & Alarm • Future Prediction • 組み合わせ活用 26
  26. Problem in Japan • Japanese is one of the most

    difficult language. • There are 60000 kanji characters(Chinese characters) • 60000 for name • 10000 for industrial product • 2400 for daily life. • There are many regional Japanese. • Japanese often don’t use subject. Sentence only have a verb and objects. • There are few separator in sentences. 27
  27. Team meeting 29 • We use speech recognition technology for

    meetings because one of the member is hard to hearing. • All member can participate in the meeting. • Our team can make minutes. UD talk https://udtalk.jp/en/ Mike Voice to text hearing difficulties Minutes
  28. Conferences • We use Japanese subtitles at the conference. •

    When there are participants from overseas, we also use English subtitles. 30 Slide Subtitles(JP) Subtitles(EN) Speaker
  29. Translation • Translation is an important technology for public sectors.

    • We were able to make English materials quickly and accurately. • We can gather the information all over the world easily. Translator Grammar check 31
  30. - Variety of education in various themes - Since full-scale

    deployment(2011 June), 170 locations, 8,000 people participated - Deliberative leads to not only the policy making at the minister, also mid-term policy at government- designated cities, and School-building and City- building among the citizen. Open question from minister(MEXT) Request for opinion from vice minister wiki Comments Council Open Q. Social media Town meeting ICT In Education Text Analysis Text Analysis - Deliberative 20 theme about 3,000 subscribers in Japan and overseas . - Voices of 15,000 has been received (Approximately 2.3 million of page views) - Measures for improving the quality of teacher capacity, (as reported to the administrative agency proposal) prior to the study conducted Deliberative Council - To take advantage of a variety of ICT in the process of policy formation in parallel with the council etc. 33
  31. Speech writing support in congress 34 Situation: Civil servants spend

    a lot of time on writing Minister’s answering speech for congress session. Our solution: To shorten time to write the speech, we did a demonstration program to create a supporting system with AI which includes (1) A search engine for similar past questions from congressmen (2) A function to extract the points of the speech to answer Result: Half of people who tried using this answered that they couldn’t get useful suggestions. Next step: We are now researching for (1)the needs of users, (2)possible data sources (3)possible AI technologies we can apply into the system and then plan to create another beta version of the system.
  32. AI for family register process (Oosaka city) • The family

    register law is complicated. This work requires expertise and experience. Less experienced staffs spend more time on contacting the branch of the Ministry of Justice(MOJ). 35 Question from Citizens Answer from staffs Rare Case Ask to the AI AI provide the answer Natural language Ontology Studied 18000 case Before AI,MOJ sometimes spend 2 or 3 weeks to answer it. Contact center have huge potential. AI will find the inconsistency through analysis.
  33. Patent Office 36 1. AI gives tags that the applicant

    didn't notice 2. Examiner evaluates AI recommendations 3. Examiner registers amendment information to AI Tagging Evaluation Learning Application form 1.Recommended tag 2.Recommended sentence that should be tagged The JPO manages more than 2000 technical fields. There are approximately 100 tags in each field. The tag is critical information to search the patent data.
  34. Recognition of handwriting and OCR 38 • Most of the

    elder people can’t use a keyboard. • Recognition of handwriting is critical function for application. • And most of public sectors use OCR. OCR Public sector Tablet and Recognition of handwriting Application forms
  35. Robot in the City Hall (Fujieda-city) • Fujieda-city use a

    robot for information service. • The robot is a concierge of the city hall. • He can inform sections, events and tourist information. 39 - Talk with citizens - Move in the reception hall - Display the maps Service Stage
  36. AI Concierge (Kawasaki city) List of the day that the

    garbage gathered How to abundant bottles How to separate my garbage I inform you the rerated information. Please click the following button. How to recycle. How to abandon electronic appliance. AI show you services. You can use free word by using the bottom field. There are various services for personal needs. So municipality staff spent much time for responding citizen’s inquiries. Solution This dialogue service provide the following inquiries, - Nursing - Moving - Garbage - Residence Registration - Application The Place you deliver it. Link to web page of the service Free key word AI Concierge Cloud service 40
  37. Dialogue web site(Ideabox) 順位 全体 投票したアイディア ポイント数 賛成票数 中立票数 反対票数

    アイディアの投稿日時 1 23 PTA改革。 74 74 0 02010/10/6 19:38 2 29 国民の声を、常設してほしいです 67 73 6 62010/9/29 11:16 3 31 公務員採用時の年齢制限を撤廃する 64 68 3 42010/9/28 22:34 4 37 日本が誇るアニメーション 55 60 15 52010/10/10 18:07 5 39 労働基準監督署の権限強化。反則金制度の導入 54 55 1 12010/9/30 0:36 6 47 年齢制限をやめる 47 48 5 12010/9/29 0:45 7 58 非正規雇用者の待遇改善 37 39 0 22010/9/28 1:38 8 61 労働基準法の厳罰化(国民の健康で文化的な生活を営める権利を認める) 36 36 0 02010/9/28 12:04 9 63 新卒・中途の区別の廃止を 34 34 3 02010/9/27 11:49 10 66 残業を取り締まってワークシェアリングを 32 34 4 22010/10/1 15:57 11 77 ブラック企業の撲滅 30 31 4 12010/9/29 10:04 12 84 非実在人類による性産業の拡大 28 30 11 22010/10/13 6:50 13 92 労働時間の管理を厳格に 27 27 0 02010/10/10 21:13 14 100 教員採用資格に年齢上限撤廃・社会経験を必須に。 23 23 0 02010/10/5 17:06 15 114 新卒一括採用と正社員優遇の中止を 20 22 2 22010/9/24 20:45 16 115 ワークライフバランスの推進 20 20 0 02010/9/29 16:11 17 126 障害者雇用の在り方を。 18 18 0 02010/9/24 14:34 18 127 税理士試験における「税務署OBの試験免除特権」廃止と民間合格者数拡大 18 18 0 02010/10/7 1:15 19 128 インターネット版オープンユニバーシティ 公立・国立大学授業公開 18 18 0 02010/10/2 2:49 20 136 労働基準監督署への申告方法→専用サイトなどの開設。 17 17 1 02010/10/7 0:50 21 140 公務員の有り方 16 21 3 52010/10/5 15:54 22 156 子の看護休暇の問題。 15 15 0 02010/10/7 1:27 23 157 刑法・道交法・労働法・納税・各種保険などの社会教育を義務教育で。 15 15 0 02010/10/12 22:41 24 158 うつ病患者の社会復帰 15 15 2 02010/9/29 19:36 25 165 労働基準監督署などへの人員増員・新規採用の拡大 14 15 0 12010/10/12 15:59 26 166 定職を持つことで安心できる日本国を実現 14 15 1 12010/10/4 17:15 27 171 農業の企業化促進 14 14 0 02010/9/26 23:23 28 172 教育委員会の解体 14 14 5 02010/9/25 10:58 29 176 法律があるのに守らないで済む世の中の整理をすべき 14 14 0 02010/10/13 21:17 30 180 履歴書から年齢を推察できる項目を外そう 13 14 0 12010/10/8 19:56 31 182 行政書士試験の公務員試験免除制度の廃止を 13 14 0 12010/10/7 9:20 32 183 義務教育で所得税のことを教えるべき 13 14 1 12010/10/7 12:12 33 188 公務員法改正 13 13 1 02010/9/30 12:56 34 204 発達障害児の受け皿を 12 13 1 12010/9/24 19:24 35 207 労働基準法解雇条項の「適切な」遵守について 12 13 1 12010/10/8 21:56 36 215 社会保険 12 12 1 02010/10/2 20:57 37 231 職安の募集に思うこと。 11 13 1 22010/9/28 21:38 38 235 公務員制度改革と給与制度 11 12 2 12010/10/5 18:00 39 240 実践的な教育改革を 11 12 1 12010/9/30 18:35 40 243 保育サポーター 11 11 0 02010/9/27 23:07 41 248 福祉関係の就労者の労働環境がすさまじく定着が厳しい 11 11 0 02010/10/5 17:30 42 250 自立を妨げる雇用保険制度 11 11 2 02010/9/29 14:18 43 255 企業の雇用情報管理義務・離職情報等の公開。 11 11 0 02010/10/9 0:15 44 258 伝統工芸に雇用を。 11 11 1 02010/10/5 16:12 45 277 在宅アニメーターで雇用を増やし日本のアニメ技術の強化 10 11 1 12010/10/12 16:42 46 279 求人票について 10 11 0 12010/10/1 13:31 47 280 悪雇用条件の対策 10 10 0 02010/10/14 0:54 48 282 インターネットでハローワークの内職斡旋をしてほしい。 10 10 0 02010/9/30 14:30 49 292 待機児童対策へ縦割り行政の転換を(空き校舎を保育所に) 10 10 1 02010/9/28 23:30 50 293 外国人 研修生・実習生制度は廃止してほしい 10 10 1 02010/10/13 21:57 創出する(18) 増やす (18) 創出(16) 流動化(13) 確保(12) 多い(9) 求める(6) 持つ(5) 規模(4) 日本(4) 技術(5) 企業(4) 雇用(4) 高い(3) 受ける(3) 補償する(4) する(3) 政策(3) つながる(2) 企業(2) 解決する(7) 雇用(5) プライバシー(2) 公務員(2) 根本的だ(2) 為替レート (2) 状況(2) 追い払う(2) 日本(2) 認める(2) 出来る(4) 関連(2) 求める(2) 失う(2) 就く(2) 格差(3) 年功序列(3) コスト(2) 安い(2) 雇う(2) 働く(7) 整備する(4) 整える(3) 必要だ(3) する(2) 流出する(5) 移転する(2) 頭脳(2) 留学(2) 買う(2) 雇用(337) 企業(120) 日本(90) 国(84) 問題(57) 現在(51) 仕事(43) 給与(40) 環境(39) 海外(35) Opinion(Original) Point ranking Text analysis Good point We can know the major opinion Good point We can pick up small opinion Policy making Good point We can Analyze without staff’s opinion Text Mining1 Text Mining2 Text Mining3 Final Report with text analysis →Minister Ideabox Ideabox Open discussion on the Ideabox 4weeks Using Idea box, Government collected public opinion on the regulation reform. We published the result of textual analysis based in collected data every weekend. 41
  38. Disaster Risk Management 1500 Tweets that include the word ”Battery”

    Size Battery Mobile phone Light A AAAA Extract “what is really needed in the devastated area” Text Mining Ministry of Economy, Trade and Industry API 3/12 3/13 3/14 3/15 3/16 1639-1702 3/18 1421 No. Word Word Word Word Word 1 Electric power Electric power Electric power Electric power Goods Goods 2 Information In formation Food Goods Gasoline Battery 3 Food Food In formation Gasoline Food Tissue 4 Goods Gasoline Goods Food Information Bottle for water 5 TV Disposable diaper Gasoline Water Fuel Flashlight 6 News Goods for baby TV Information Shelter Toilet paper 7 Lighting Goods Place Fuel Oil-based product Light oil 8 Machine TV Map Disposable diaper Water Waterproof canvas 9 Shelter Water Water Plastic bag TV Nonperishables 10 Water News Disposable diaper Fleece Car Tank for Gasoline 11 Blood Shelter Fuel Cloth Shop Electric power 12 Power supply Shop Exercise Shop Electric power Water “Shorting items”daily ranking Photo: MLIT 2011.3.11 East Japan great earthquake
  39. Analysis of public opinions in Twitter 43 government Tokyo Electronic

    Power cooporation index Nation Mass media Nuclear plant Fukushima citizens reliability government index Can’t believe Not reliable Can’t believe Change Emergency situation lost panic deceive On May7, we practiced text analysis topic key words “government” and “can’t believe” 232Tweets 2011/05/03 14:45:47~2011/05/07 14:53:54 The result revealed citizens demand the government for detailed explanations on the numerical criterion (e.g. radiation level ) Nothing Not reliable Not reliable Not reliable
  40. Latest Disaster Risk Management project We use human density data

    and rainfall data with the NLP. SNS messages density is also important. We identified areas with particularly rainfall, especially from rainfall information. We analyzed the SNS of the area and grasped the flood situation. Rain Number of sns messages sns messages 44
  41. Latest Disaster Risk Management project We can choose the category

    of problem Number of Tweets Click We can get the detail information Drill down
  42. Crime prediction map(Tokyo Metropolitan) • Tokyo metropolitan police analyze signs

    of crimes from 2019-4. • Sign (7000cases/year) --- 60 thousands case data from 2010. • A stranger speak to a person on road. • A stranger follow a passerby • In the test operation, Police arrested a suspicious person 46 Figure Height Clothes Conversation Data model of Strangers AI predict the location Patrol Citizens send reports History of crime Use case
  43. Big data-STATS (AI for Economic Index) 47 • METI provide

    economic index by using AI. This is new approach to make index. METI publishes it once a week. 1. AI for filtering SNS messages 2. AI for evaluate SNS message’s feeing SNS Today, there is along line in front of my restaurant. Very busy day! Stock market is good. But my salary don’t rise! I don’t feel economic growth. (Positive) (Negaitive) Message about economy (SNS Economy index) Message about job (SNS IIP index) 47 http://qr.nomura.co.jp/quants/sns_ai/#jump2 Red :statistics Blue: AI index
  44. Smart speaker(Kamakura-city) 48 http://www.soumu.go.jp/main_content/000595331.pdf Kamakura city provide smart speakers for

    the elderly residents. Information Entertainments Utilities Safety Health Communication Event Bulletin board TV program Music Karaoke Storyteller Weather Wake up call Railway and bus News Conversation Disaster related Info. Monitoring Voice training Pray Excise
  45. Data quality for NLP • Simple and short sentence •

    Mechanism for data quality • Dictionary • Base Registries • Structured data • Provenance • Disinformation 50
  46. Dictionary 51 • It is necessary to use the appropriate

    dictionary according to the purpose of the analysis. • It was necessary to discuss how to categorize synonymous and related vocabulary. • (EX)“fuel” “kerosene” “gasoline” “Petroleum products” Advanced analyze is available by filtering Twitter messages with the list of relief supplies 3 days after the earthquake water = drinking water water beverage drink distilled water =distilled water 1 week after the earthquake water=water drinking water=drinking water beverage=beverage =tea =juice distilled water =distilled water
  47. Base registries and data source 52 It is important to

    develop base registries with proper nouns and basic information. Land People Business Characters Name Legal entity name Address Geographic information Facility name Japanese Kanji character Data source for analysis We should gather the various data sources. ex: Some of data source only gather the opinions from young persons
  48. Structured Data 53 Gov. Date exchange Std. Core vocabulary Templates

    Characters Code Service Data Touris m Date, Address, Phon … Sensor Data Data item (Data definition) Information (Data model) Healthc are Transp ortatio n Disaste r Public service Infrastr ucture Manufa cture ・・・ Format Service catalogue AED Location Address LocationTwoDimensional GeographicCoordinate Equipment Information Spot of Equipment Business Hours Owner Access Availability User Day of Installation Homepage AED Information Type of Pad Expiry date Contact Type Model Number Serial Number Photo Note Information Source To use NLP effectively, it is easier to analyze if the data items are separated. We are promoting the government interoperability framework to all publicsectors.
  49. Provenance Staffs of public sector Citizens Reports The public sector

    gather information from many sources. We should consider how to analyze and filter the data. Analyze We usually use 3 views. Total, report from staffs and report from citizens 54
  50. Filtering There is response biases caused by the huge amount

    of multiple retweets The number of Tweets deceased from 1500 to 489 after deleting multiple retweets. Support Support Support Support Support Relief aid Relief aid Relief aid Relief aid need need need need volunteer Disaster victims information information Devastated area Devastated area Fund raise Portable toilet demand Filter tweets and delete multiple retweets 55
  51. Response bias caused by the huge amount of multiple retweets

    • To solve response bias, we tried to… • Get Twitter message many times a day • Use the function to delete multiple retweet 1473 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1474 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1475 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1476 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1477 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1478 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1479 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1480 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1481 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1482 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1483 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1484 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1485 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1486 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1487 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1488 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:20 1489 RT @shuichi_jp: 長野県災害対策支援本部では、ガソリンの需給状況など県民生活に影響がある情報についても収集、分 2011/3/14 18:20 1490 報道各社、東北の被害の大きさのおさらいではなく、今の状況とこれから必要な物、今まで報道してない茨城と千葉の今を 2011/3/14 18:20 1491 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1492 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1493 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1494 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1495 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1496 RT @motoikotaka: 拡散希望】東北だけじゃない。茨城県も支援が必要です。 助けて▪岩手や福島に比べたら被害はひど 2011/3/14 18:19 1497 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1498 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1499 RT @mmc_kgur: 大切なのは、ブームの様な形で支援を終わらせないこと。復興には何十年と時間が必要です。今を頑張り 2011/3/14 18:19 1500 ガソリンの買い占め、何とかならんかな(/ _ ; )関西圏から、節電で電気の支援は出来んかったけど、ガソリンは全国共通じ 2011/3/14 18:19 3/18 1421-1441 No. 単語 割合 1 物資 44.8% 2 電池 25.7% 3 ティッシュペーパー 25.4% 4 飲料水用容器 25.4% 5 懐中電灯 25.3% 6 トイレットペーパー 25.3% 7 軽油 25.3% 8 ブルーシート 25.2% 9 保存食 25.2% 10 ガソリン用携行缶 25.1% 11 電力 12.3% 12 水 9.5% 13 ガソリン 8.6% 14 テレビ 8.2% 15 紙おむつ 7.7% 16 食事 7.6% 17 かゆ 6.9% 18 乳幼児用 6.9% 19 入院患者 6.9% 20 無洗米 6.9% 21 流動食 6.9% 22 ボランティア 5.2% 23 血液 4.5% 24 燃料 4.2% 25 食料 4.2% These vocabularies appear same number of times. Thus we can guess they were retweeted in the same message. 56 (Example) the huge amount of retweets
  52. Response bias caused by the huge amount of multiple retweets

    • You need to delete multiple retweets to find minor opinions. The number of Tweets deceased from 1500 to 489 after deleting multiple retweets. Filter tweets and delete multiple retweets Support Support Support Support Support Relief aid Relief aid Relief aid Relief aid need need need need volunteer Disaster victims information information Devastated area Devastated area Fund raise Portable toilet demand 57
  53. Disinformation • We often get disinformation or not confirmed information.

    58 The lion ran away from the zoo. Kumamoto-city There are many retweet message. Most of disinformation come from a single source.
  54. Conclusion • We have many experiences in the NLP field.

    • However, it is not used stably. • Communication Field • We will use the NLP in all business areas • Analysis Field • We are making a ground design for the public sector now. • The design include the AI, especially focus on the data analysis and EBPM. • We will make concrete project about the NLP next year. 59
  55. Future model: AI for Aging Society 60 High Hospitality services

    Communication Conversation Handwriting Participation Administration services Healthcare Mobility service Drone From the digital divides to the digital supports. NLP is one of the essential technology for Aging Society http://robotcare.jp/wp-content/uploads/2017/04/List-of-commercialized-equipment.pdf Therapy Robot ”Paro” http://www.aist.go.jp/aist_j/press_release/pr2004/pr20040917_2/pr20040917_2.html Powered suit Robot car Smart speaker
  56. For more information Kenji Hiramoto Cabinet Secretariat Government of Japan

    [email protected] 61 Detail of the text mining project in disaster risk management https://www.slideshare.net/hiramoto/150317un-disaster