Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Localization (CJK) Challenges and Possibili...

曾政嘉
October 11, 2017

The Localization (CJK) Challenges and Possibilities in Taiwan

Short bio of myself:
My name is Cheng-Chia Tseng (it is easier to pronounce the first name as ZenJia). I am a long term free/libre and open source software translator who contributes to Fedora, LibreOffice, GNOME and Ubuntu. What I contribute most to the LibreOffice project are translations and other localization issues.

Short descriptions:
We have been making efforts to build a local LibreOffice community in Taiwan since 2010. The localization process is the most important part which makes LibreOffice accepted to the users.

Along with translations, there are some more localization challenges such as old conventions from movable type printing system, Chinese typography, vertical layout of CJK text and the missing Chinese characters which not yet defined in the Unicode standard (could be described with IDS), etc.

In this talk, I would like to take an overview of the localization challenges we met in Taiwan and take about the future possibilities.

License: Creative Commons Attribution-Share Alike 4.0 License

曾政嘉

October 11, 2017
Tweet

More Decks by 曾政嘉

Other Decks in Technology

Transcript

  1. [email protected] The Localization (CJK) Challenges and Possibilities in Taiwan ROME

    | 11th October 2017 曾政嘉 Cheng-Chia Tseng (zerng07)
  2. 2 The Overview of L10N 5 Major aspects 1. Message

    translation (UI, Help, Website) 2. Text output (display on the screen, print on papers) 3. Text input (type with a keyboard or select with programs) 4. Information processing (supported by the program) 5. Adapt to the local culture (such as the calendar, the cultural difference on color psycology, convensions on icon design, etc.)
  3. 3 L10N Work by the Community in Taiwan The Taiwan

    community is getting more and more active these years • I maintain the translation of the UI, Website and LibreOffice Online • Jeff Huang and I work on the native language website • Mark Hung works on the CJK support • Franklin Weng lead the work on LibreOffice training, migration and marketing in Taiwan
  4. 6 Minguo Calendar Identification Minguo calendar support is not intuitive

    enough • The ruling government in Taiwan is the Republic of China (ROC) which was founded in 1912 in mainland China. • 2017 is 民國 106 年 (Minguo year 106). (2017-1912+1) • In general, people in Taiwan (zh_TW locale) use both systems in living. – 2 or 3 digits (such as 99 or 106) for Minguo year – 4 digits (such as 2017) for common era year • The public servants in the government only use Minguo calendar. “106/10/11” for 2017/10/11 => LibreOffice identifies as “0106-10-11” https://bugs.documentfoundation.org/show_bug.cgi?id=113184
  5. 9 Vertical Layout of the CJK text • The support

    of vertical layout of CJK text is a mess (5.3, 5.4) – vertical glyphs cannot be displayed in the slide show (including Latin text) – Han glyphs are rendered at a far distance higher from the cursor position To know more about the details of the CJK vertical layout, please read the Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde 垂 直 文 字 vetical text https://bugs.documentfoundation.org/show_bug.cgi?id=103729
  6. 11 Asian Phonetic Guide (Ruby) • Bopomofo ruby is the

    way Taiwanese teach children to learn the Mandarin reading of the Han (Chinese) characters • One Han character may read in more than one pronunciation. Some can read in 6 different pronunciations. Picture source: https://speakerdeck.com/bobbytung/du-2017-liao-zhu-yin-huan-mei-gao-ding-ma Author: Bobby Tung, CC-by-SA 4.0 International
  7. 12 Asian Phonetic Guide (Ruby) • Mono Ruby One or

    more ruby glyphs serve to annotate only a single base glyph. Used in Chinese, Japanese and Korean text to annotate Han characters. Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde
  8. 13 Asian Phonetic Guide (Ruby) • Group Ruby (LibreOffice design

    takes group ruby in mind) Ruby glyphs serve to annotate two or more base glyphs. Used in Japanese text to read the kanjis (Han characters). Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde
  9. 14 Asian Phonetic Guide (Ruby) • The Problem for Chinese

    LibreOffice seperates the phrase automatically for you to annotate, so 一見如故 is treated as one phrase after selection (group ruby). As Bopomofo we have to annotate one by one (mono ruby). Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde https://bugs.documentfoundation.org/show_bug.cgi?id=113189
  10. 15 Asian Phonetic Guide (Ruby) • Horizontal Ruby Reference: https://bobbytung.github.io/BopomofoLayoutTest/case01/index.html

    Upper picture source: https://speakerdeck.com/bobbytung/du-2017-liao-zhu-yin-huan-mei-gao-ding-ma Author: Bobby Tung, CC-by-SA 4.0 International Demo by Bobby Tung to demonstrate the HTML5 ruby module implentation, tweaked with OpenType features Directly typed with LibreOffice Phonetic Guide
  11. 16 Asian Phonetic Guide (Ruby) • Vertical Ruby Left picutre

    source: https://speakerdeck.com/bobbytung/du-2017-liao-zhu-yin-huan-mei-gao-ding-ma Author: Bobby Tung, CC-by-SA 4.0 International Demo by Bobby Tung to demonstrate the HTML5 ruby module implentation, tweaked with OpenType features Directly typed with LibreOffice Phonetic Guide
  12. 17 Typographic Scale Convention (Taiwan) Based on grid system and

    mutiples between scales • 字田活字盒 ZiTien Movable Type Box 二號 21 pt 初號 42 pt 五號 10.5 pt
  13. 18 Typographic Scale Convention (Japan) Set A Set B (mostly

    used scale) Set C (frequently used) 初號 Primary : 42 pt (4) Title 一號 One : 27.5 pt (2) 二號 Two : 21 pt (2) Heading 三號 Three : 16 pt (2) 四號 Four : 13.75 pt (1) 五號 Five : 10.5 pt (1) Body text 六號 Six : 8 pt (1) 七號 Seven : 5.25 pt (0.5) Ruby 八號 Eight : 4 pt (0.5) This system was invented in Japan and introduced to Taiwan. There are 3 sets of scale. Color in yellow: base factor
  14. 19 Typographic Scale Convention (Japan) Set A Set B (mostly

    used scale) Set C (frequently used) 初號 Primary : 42 pt (4) Title 一號 One : 27.5 pt 二號 Two : 21 pt (2) Heading 三號 Three : 16 pt if treated as 15.75 pt (1.5) 四號 Four : 13.75 pt 五號 Five : 10.5 pt (1) Body text 六號 Six : 8 pt if treated as 7.875 pt (0.75) 七號 Seven : 5.25 pt (0.5) Ruby 八號 Eight : 4 pt Ruby Color in yellow: regularly used in combination
  15. 20 Typographic Scale Convention (China) Set A Set B (mostly

    used scale) Set C (frequently used) 初號 Primary : 42 pt (4) 小初號 (small) : 36 pt Title 一號 One : 26 pt 小一號 (small) : 24 pt (2) 二號 Two : 21 pt (2) 小二號 (small) : 18 pt Heading 三號 Three : 16 pt (2) 小三號 (small) : 15 pt 四號 Four : 14 pt 小四號 (small) : 12 pt (1) 五號 Five : 10.5 pt (1) 小五號 (small) : 9 pt Body text 六號 Six : 8 pt (1) 小六號 (small) : 6.5 pt 七號 Seven : 5.5 pt Ruby 八號 Eight : 5 pt In China, a system comparable to the Japan system developed
  16. 21 Typographic Scale Convention (China) Set A Set B (mostly

    used scale) Set C (frequently used) 初號 Primary: 42 pt (14/3) 小初號 (small) : 36 pt (4) Title 一號 One : 26 pt 小一號 (small): 24 pt (8/3) 二號 Two : 21 pt (7/3) 小二號 (small) : 18 pt (2) Heading 三號 Three : 16 pt 小三號 (small):15 pt (5/3) 四號 Four : 14 pt 小四號 (small):12 pt (4/3) 五號 Five : 10.5 pt 小五號 (small) : 9 pt (1) Body Text 六號 Six : 8 pt 小六號 (small) : 6.5 pt 七號 Seven : 5.5 pt Ruby 八號 Eight : 5 pt There are more scales relate to one another in multiples
  17. 22 Typographic Scale Convention (Taiwan) Set A Set B (mostly

    used scale) Set C (frequently used) 初號 Primary: 42 pt (14/3) Larger Title 新五號四行 New Five*4 : 36 pt (4) Title 一號 One : 27.5 pt 二號 Two : 21 pt (7/3) Heading 三號 Three : 16 pt 四號 Four : 13.75 pt 五號 Five : 10.5 pt (7/6) Larger Body Text 六號 Six : 8 pt 新四號 New Four:12pt (4/3) Section 新五號 New Five : 9 pt (1) Body Text 七號 Seven : 5.25 pt Ruby 八號 Eight : 4 pt Green ones: fonts imported from China Cells with yellow color: regularly used in combination In Taiwan, new fonts were imported from China and then adapted to the original Japan system
  18. 23 Typographic Scale Convention (Taiwan) The listed sizes of typpographic

    scale in LibreOffice are: 6, 7, 8, 9, 10, 10.5, 11, 12, 13, 14, 15, 16, 18, 20, 22, 24, 26, 28, 32, 36, 40, 44, 48, 54, 60, 66, 72, 80, 88, 96 pt • Regardless of the small sizes, the mostly used 21 pt & 42 pt of size in Taiwan, Japan and China are missing in the size list • The typographic scale convention is getting widely known theses year in Taiwan due to the popularity of the movable type preserved by 日星鑄字行 RiXing Type Foundry and other projects such as 字田 活印盒 . • It is better to implent a toggle to switch to the typographic scale convention for ease of use by professional typographic designers. https://bugs.documentfoundation.org/show_bug.cgi?id=113191
  19. 24 Typography: first line indentation Typically, Chinese paragraphs are indented

    by 2 characters. • If you indent the first line by 2 characters, then it is fixed to 21 pt due to the default size as 10.5 pt. • However, when you adjust the size of the paragraph into 12 pt, the indentation is still 21 pt. https://bugs.documentfoundation.org/show_bug.cgi?id=36709
  20. 25 Advanced CJK typography Line breaking and word wrapping problems

    (not yet reported): 1. In Asian Layout setting, “Not to be broken on either side” or 分離禁止 文字 (inseparable characters) rule is not supported in LibreOffice, eg. —— and …… 2. There are 3 fundamental methods used to line-break or word-wrap CJK text. – Push-in-first – Push-out-first – Push-out-only, or hanging punctuation (LibreOffice behavior)
  21. 26 Advanced CJK typography Line breaking and word wrapping •

    The red circles in the beginning and the end of the line are forbidden punctuations which shouldn’t be there. Figure from Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde Fare Use
  22. 27 Advanced CJK typography Line breaking and word wrapping •

    Push-in-first Move characters that are prohibited from the beginning back to the end of the previous line. Or shift up a character from the following line that are prohibited from terminating. • It would be great if LibreOffice can support this strategy. Figure from Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde Fare Use
  23. 28 Advanced CJK typography Line breaking and word wrapping •

    Push-out-first (causes premature end) Figure from Chapter 7: Typography, CJKV Information Processing, 1st Edition by Ken Lunde Fare Use
  24. 29 Advanced CJK typography Line breaking and word wrapping •

    Push-out-only, or hanging punctuation (adopted in LibreOffice) A punctuation is left hanging on the right margin (or bottom margin in vertical mode). Figure from Chapter 7: Typography, CJKV Information Processing, 2nd Edition by Ken Lunde Fare Use
  25. 30 Possibility of Unicode IDS Support There are always new

    Han chracters added to each Unicode version • Unicode 10 Standard: (2017) 136,690 CJK Han characters • Max glyphs in a OpenType font: 65,535 glyphs • Two problems of missing Han glyphs: – not encoded in the Unicode standard – not included in the font although encoded • Use Unicode IDS (Ideographic Description Sequence) to describe the missing Han characters and compose the glyphs dinamically in 2D • At http://組字.意傳.台灣/ , it will return a rendered picture back. Written in Java, source code licensed under Affero General Public License, GitHub project han3_ji7_tsoo1_kian3.
  26. 31 Possibility of Unicode IDS Support IDS combination syntaxs: •

    ⿰ left to right e.g. 話 vs ⿰言舌 • ⿱ above to below e.g. 果 vs ⿱田木 / 曌 vs ⿱明空 • ⿲ left to middle and right e.g. 湖 vs ⿲氵古月 • ⿳ above to middle and below e.g. 舅 vs ⿳臼田力 • ⿴ full surround e.g. 囚 vs ⿴囗人 • ⿺ surround from lower left e.g. 翅 vs ⿺支羽 / 過 vs ⿺辶咼 • ⿴⿵⿶⿷⿸⿹ ⿻ etc.
  27. 32 Possibility of Unicode IDS Support • ⿰ => •

    => • ⿺辶⿳穴⿰月⿰⿲⿱幺長⿱言馬⿱幺長刂 => • (new unkown character) => • It will be great to have extensions supporting IDS and rendering to deal with missing Han glyphs for ancient book digitalization and displaying new characters in documents. This glyph is implented as a “ligature” feature of Source Han Serif, can be shown before being encoded into the Unicode standard. Reference site: http://組字.意傳.台灣/
  28. 33 Special thanks to: • Dr. Ken Lunde, for his

    great work on CJKV information processing • Bobby Tung, for his talk on Bopomofo ruby • Shoichi Chou, for his talk on Unicode IDS Support • And the whole LibreOffice community!
  29. All text and image content in this document is licensed

    under the Creative Commons Attribution-Share Alike 4.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. Questions?