Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pseudo SMT at COOKPAD

robotvert
August 29, 2013

Pseudo SMT at COOKPAD

Slides of the talk I gave at Ride the Lightning vol. 2 on August 29th, 2013.
Video here: http://www.youtube.com/watch?v=oAFfJBXPX-k

robotvert

August 29, 2013
Tweet

More Decks by robotvert

Other Decks in Technology

Transcript

  1. en.cookpad.com - import JP from cookpad.com - have translators /

    reviewers make an EN version - started a few months ago using Transifex - site is publicly accessible since Aug 5th. Friday, August 30, 13
  2. Opportunities - Need to clone Transifex’s “good” features: - Semi

    automated phrases translation - Glossary / suggestions - Progress indicator etc. Friday, August 30, 13
  3. Today - Need to clone Transifex’s “good” features: - Semi

    automated phrases translation - Glossary / suggestions - Progress indicator etc. Friday, August 30, 13
  4. Ingredients translation - Restricted vocabulary - (Almost) no grammar involved

    - Not fun to translate - Waste of time & quality goes down Friday, August 30, 13
  5. Challenges in translation - word ambiguity (book a flight /

    read a book) - word order (English / Japanese) - pronouns meaning ([...], it is good) - etc... Friday, August 30, 13
  6. In our case - JP doesn’t have plural forms -

    ۄͶ͗ → “Onion”? “Onions”? - Can’t parse the quantities easily - Quantity ambiguities - ʮେ1ʯ →ʮେ̍͞͡ʯɺʮେ͖Ίͷʓʓ̍ʯ - ʮ̎ʯ→ “2 cloves”? “2 slices”? ... Friday, August 30, 13
  7. Direct translation word by word. no analysis. rules based. translating

    “much”: if previous word is ... then ... else if previous word is ... and next is ... then ... ............. Friday, August 30, 13
  8. Direct translation can achieve something for EN to FR but

    EN to JP is a different story don’t know anything about the context “he said that ...” / “I like that car” Friday, August 30, 13
  9. Transfer & interlingua based systems analyze the data build a

    representation of the meaning of a sentence that is independent of the language anyway, it’s just crazy s**t Friday, August 30, 13
  10. Statistical MT - build parallel corpora with conditional probabilities for

    each sentence - fetch the most likely translation for a given sentence Concept: use sample sentences translated in both languages In practice: Friday, August 30, 13
  11. Implementation Dead simple (for now) - Try a perfect “name

    quantity” match - Fallback to “name” only - done The more the translations, the better the system Friday, August 30, 13