Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Translating documentation with cloud tools an...

Translating documentation with cloud tools and scripts

It is more or less clear which tools to use when translating the text in the software itself, but not so much for documentation, especially when this documentation evolves and needs to be updated. This talk presents a pipeline to create an initial version converting markdown files and html into po files. Using a script to automatically translate them using the cloud and what to do when the document gets updated.

Deck presented at Fosdem 2023

More Decks by Nilo Ney Coutinho Menezes

Other Decks in Programming

Transcript

  1. Translating documentation with cloud tools and scripts Using cloud tools

    and scripts to translate, review and update documents Nilo Coutinho Menezes FOSDEM 2023 - Brussels
  2. How it started  “Nim for Python Programmers”  https://github.com/nim-lang/Nim/wiki/Nim-for-Python-Programmers

     Original written in English translated into Spanish  GitHub Wiki page, written in Markdown format  Community driven, updates in English
  3. How it normally goes  You clone/copy the wiki page

    into a new version in your native language  You start to manually translate each line  It is a manual process  Once the original document is updated, you start to pick and move each update to the translated document  Prone to errors  Make updates difficult
  4. Looks like an old problem  Same issues as translation

    of strings in programs  GNU Gettext could be used to solve these issues  It extracts strings from source files  Create a portable object file (PO)  Clear, simple, text format  Rich eco system with multiple tools  Adopted by multiple projects  Human readable format #: Nim-for-Python-Programmers.md:block 1 (header) msgid "Table Of Contents" msgstr "Índice" #: Nim-for-Python-Programmers.md:block 2 (table) msgid "[Comparison](#Comparison)" msgstr "[Comparação](#Comparison)"
  5. Markdown files  The initial challenge was to support markdown

    files or to create po files from a markdown source  After testing multiple packages, mdpo was chosen.  https://github.com/mondeja/mdpo  Written in Python  Can translate markdown to po and vice-versa
  6. po files  It would be great to be able

    to open an manipulate these files in Python  Polib does exactly this  https://github.com/izimobil/polib/  It can open, filter and update a po file  Handy to check untranslated strings
  7. The cloud  We can now easily extract untranslated strings

    from the po file  It is cheap to translate strings  Boto3 with AWS Translate  Not free, but most of the time it should fit on free tier usage po = polib.pofile(FILE) for entry in po.untranslated_entries(): print(entry.msgid, entry.msgstr) response = client.translate_text( Text=entry.msgid, TerminologyNames=[], SourceLanguageCode=SOURCE_LANGUAGE, TargetLanguageCode=TARGET_LANGUAGE, Settings={ 'Formality': 'FORMAL', 'Profanity': 'MASK' }) entry.msgstr = response["TranslatedText"] po.save()