Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Waluku: Answering Astronomy Questions through S...

Jim Geovedi
October 15, 2013

Waluku: Answering Astronomy Questions through Social Media

langitselatan as one of the established astronomy community in Indonesia have been actively use social media to interact and discussion with their members and general public. Since 2011 langitselatan received question from public to answer in the form of blog article and now planning to extensively use social media network for astronomy outreach.

This paper reports the development and implementation of Waluku, an online astronomy knowledge base management system with the extension of the dialogue based natural language chatbot on the Twitter social network, that creates responses based on information extracted from langitselatan blog articles, Wikipedia articles and community supplied answers.

Jim Geovedi

October 15, 2013
Tweet

More Decks by Jim Geovedi

Other Decks in Technology

Transcript

  1. langitselatan Bandung, Indonesia. Near the Bosscha observatory. Premiere online Astronomy

    media house since 2007. Astronomy news, answering hoaxes and providing, basic astronomy, educational resources and fun science
  2. Social media in Indonesia • Facebook and Twitter the numbers

    matter. • Internet in Indonesia is nearly there with its penetration at 50% and growing. • Mobile telephony is at 100% of population with 40% of mobiles are internet capable.
  3. Waluku Summary: Twitter chatbot that heavily uses natural language equipped

    with closed-domain (Astronomy) knowledge. Why? Because asking questions is one of the most basic human norm. Because social media is cool. Because astronomy is cool. Because artificial intelligence is cool. Because giving answer “Google it” is considered rude.
  4. System overview Knowledge Base Input Analysis Module Response Generator Q&A

    Module Unstructured Documents User Input Response Crowdsourcing Framework
  5. User input Challenge: People do not check whether their question

    has been asked before therefore they will ask the same question using different wording.
  6. Input analysis Input Extracted keywords Ranks Bagaimana bulan sabit dapat

    terjadi? bulan sabit, bulan 0.89212 Apa penyebab terjadinya bulan sabit? bulan sabit, bulan, penyebab 0.75419 Apakah wajah bulan akan selalu sama? wajah bulan, bulan 0.64021 Jelaskan proses terjadinya bulan sabit? bulan sabit, proses 0.65993
  7. Phrase similarity bagaimana mengapa planet galaksi bulan twitter mengapa kenapa

    mars galaxy neil amstrong twit apa penyebab apa venus andromeda bulan sabit tweet apa yang menyebabkan knapa jupiter bima sakti bulan purnama tuit proses terjadinya ngapa merkurius milky way bulan merah jambu ngetwit
  8. Response generation Sources: langitselatan’s Tanya Jawab website section, astronomy-related Wikipedia

    articles and volunteer provided question and answer. Raw unstructured contents will be automatically summarized (factoid extraction) by NLP (Natural Language Processing) engine and will be saved to the database as utterance pair (question and answer). Response formats: Summarized factoids (less than 500 characters and contain URLs) and Twitter-friendly messages (140 characters or less).
  9. Factoid extraction Periode rotasi Bulan tidak sama dengan periode rotasi

    Bumi. Periode rotasi Bumi adalah 24 jam (1 hari), sementara periode rotasi Bulan adalah 27.3 hari. Wajah bulan yang dilihat oleh seluruh manusia di Bumi, baik di Indonesia maupun di belahan Bumi lainnya selalu nampak sama. Periode rotasi Bumi adalah 24 jam (1 hari), sementara periode rotasi Bulan adalah 27.3 hari. Wajah bulan selalu nampak sama. Raw text Extracted factoids
  10. Utterance pair A Concept fase bulan B Question • Mengapa

    bulan berbentuk sabit? • Bagaimana bulat sabit bisa terjadi? • Bagaimana bulan sabit dapat terjadi? • Apa penyebab terjadinya bulan sabit? • Apakah wajah bulan akan selalu sama? • Jelaskan proses terjadinya bulan sabit? C Answer (long) Fase Bulan (sabit maupun yang lain) terjadi karena kita di Bumi mengamati sinar Matahari jatuh ke Bulan pada sudut pandang yang berbeda-beda. Lebih lanjut, lihat http://langitselatan.com/2012/05/27/apakah-wajah-bulan-selalu- sama/ D Answer (short) Itu terjadi karena kita di Bumi mengamati pada sudut pandang yang berbeda-beda. http://goo.gl/i5MrxA
  11. Crowdsourcing Technology limitation. NLP is still considered a hard task

    (to make long sentences short, to extract factoids, etc). Constant training and tweaking are required. Quality improvement. Help find an adequate response to user input.
  12. Design limitation Waluku is a factoid chatbot. It will not

    be able to handle casual chat input. NLP is language dependent. Indonesian NLP initiatives and support are very limited. We have to roll our own approach.