Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Recent Status of the TeX Community in Japan / b...

Recent Status of the TeX Community in Japan / bachotex2025

In this talk, I will discuss the recent state of the TeX community in Japan. First, as background, I will introduce some distinctive characteristics of Japanese typesetting compared to Western typesetting, and briefly explain how TeX systems tailored for Japanese have addressed these differences. Following this, I will describe recent trends in communication within the Japanese TeX community, as well as how it is adapting to the active development of LaTeX.

Avatar for Watson

Watson

May 03, 2025
Tweet

More Decks by Watson

Other Decks in Technology

Transcript

  1. Recent Status of the TEX Community in Japan Recent Status

    of the TEX Community in Japan Takuto Asakura BachoTEX 2025 1 / 21
  2. Recent Status of the TEX Community in Japan Today’s Topics

    Part I: Key Differences in Japanese and English Typesetting — Providing the background that shapes the Japanese TEX community Part II: (L A )TEX Systems for Japanese Typesetting — A snapshot of today’s TEX ecosystem in Japan Part III: Recent Trends in Development and Communication — Current challenges facing us and how we are addressing them 2 / 21
  3. Recent Status of the TEX Community in Japan Part I:

    Key Differences in Japanese and English Typesetting 3 / 21
  4. Recent Status of the TEX Community in Japan Key Differences

    in Japanese and English Typesetting Character-Set & Space Model Divergence Use variable-width letters plus inter-word spaces Rely on fixed-width glyphs and virtually no word spaces Line-breaking logic starts from fundamentally different premises Strict Line-Breaking Prohibitions Large sets of leading-prohibited (e.g., closing punctuation) and trailing-prohibited characters (e.g., opening brackets) Mixed-Script Composition Is the Norm ▶ Numerals, English words, formulae, and symbols appear routinely inside Japanese prose ▶ Automatic inter-script spacing (e.g., \xkanjiskip), kerning, and baseline alignment are required at every boundary 4 / 21
  5. Recent Status of the TEX Community in Japan Character-Set &

    Space Model Divergence Fixed-width (full-width) vs. variable width Builds rhythm with variable-width glyphs + inter-word spaces Kanji and kana occupy a nominal 1-em square grid Lack of inter-word spaces — different line-breaking logic Candidates occur mainly at space positions and hyphenation points Line-break candidates = virtually every character boundary ೔ຊޠςΩετ English text ೔ຊޠςΩετ Characteristic text 5 / 21
  6. Recent Status of the TEX Community in Japan Strict Line-Breaking

    Prohibitions The rule set of “where you may not line break” ▶ No inter-word spaces = all character boundary is a candidate break Prohibitions exclude a large fraction ▶ Readability hinges on the positions of punctuation marks — commas, periods, brackets, repeated dashes, etc. Core categories: “no break before” vs. “no break after” ▶ Forbidden at line start period, comma, closing bracket, etc. ▶ Forbidden at line end opening bracket, long dash, etc. 6 / 21
  7. Recent Status of the TEX Community in Japan Mixed-Script Composition

    Is the Norm Latin letters, numerals, math formulae, and code chunks appear frequently in Japanese texts (at least in scientific documents) Designed on the premise that several scripts share a single line: ▶ Automatic insertion of appropriate spacing at script boundaries ▶ Handling conflicts in line-breaking logic i.e., Western hyphenation and Japanese line-breaking prohibition tables are evaluated in parallel ▶ Baseline alignment & glyph-size harmonization Without Kanji-Latin Space (\xkanjiskip = 0 em) ͜ͷจॻ͸ ▲ ɹ L A TEX ɹ ▲ ͱ ▲ ɹ dvipdfmx ɹ ▲ Ͱ࡞੒͞Εͨɻ With Kanji-Latin Space (\xkanjiskip = 0.25 em) ͜ͷจॻ͸ L A TEX ͱ dvipdfmx Ͱ࡞੒͞Εͨɻ 7 / 21
  8. Recent Status of the TEX Community in Japan Other Features

    Unique to Japanese Typesetting Vertical writing (top-to-bottom, right-to-left) In recent years, horizontal text has become common in Japanese, but vertical writing is still used in newspaper, literature, etc. Multi-layer inline layout (variants of emphasis) Ruby: annotative glosses
 to show pronunciation Warichu: Japanese variation
 of footnote or sidenote Kenten: emphasis dot Tate-chu-yoko:
 2–4-digit numbers or
 Latin strings set sideways 8 / 21
  9. Recent Status of the TEX Community in Japan Part II:

    (L A )TEX Systems for Japanese Typesetting 9 / 21
  10. Recent Status of the TEX Community in Japan Current (L

    A )TEX Workflows for Japanese Typesetting Two practical approaches are in everyday use ▶ pTEX variants: pL A TEX or upL A TEX + dvipdfmx ▶ A Japanese-specific extension maintained since the 1990s ▶ Support for JFM, native line-break prohibitions and vertical-typesetting primitives, etc. ▶ LuaTEX-ja: LuaL A TEX + luatexja package ▶ Implements the Japanese layer in Lua on top of the LuaTEX engine ▶ Meets Japanese requirements while keeping feature parity with standard Western L ATEX pTEX variants and LuaTEX-ja Engine Speed Stability Sustainability Users pTEX Fast Very high Great concern General / Passively conservative LuaTEX Slow Moderate Fine Somewhat advanced 10 / 21
  11. Recent Status of the TEX Community in Japan pTEX —

    Core Extensions for Japanese Typesetting Support for JFM (Japanese Font Metrics) ▶ Per-character-class settings for width, side bearings, kerns, etc. ▶ Auto-inserted glue: \kanjiskip (between Japanese chars), \xkanjiskip Line-break prohibitions built into the core ▶ Line-head / line-end prohibition is integrated inside the engine core ▶ During break search, \prebreakpenalty and \postbreakpenalty are assigned to eliminate illegal candidates Native support for vertical writing Direction switches with \yoko (horizontal) and \tate (vertical); the engine rotates the glyph coordinate system accordingly LuaTEX-ja handles the same issues via flexible Lua callbacks 11 / 21
  12. Recent Status of the TEX Community in Japan The pTEX

    Legacy (1) pTEX vs. upTEX pTEX 8-bit JIS encoding; kanji are stored as two bytes, making user-defined characters hard to extend upTEX Internal Unicode encoding; so that users can use characters outside of the 8-bit JIS encoding It covers all BMP (Basic Multilingual Plane) CJK characters plus IVS (Ideographic Variation Sequence) support and more From TL 2024 onward, an unified binary → ptex is now upTEX’s pTEX-compatible mode 1 pTEX and its variants The figure below shows the relationship between engines. TEX ??  -TEX ??  pdfTEX pTEX ??  -pTEX  upTEX ?? -upTEX pTEX is an old Japanese-specific extension of TEX82, which aims setting of Japanese text but only supports a limited character set, JIS X upTEX is developed as an extension of pTEX to support full Un 12 / 21
  13. Recent Status of the TEX Community in Japan The pTEX

    Legacy (2) Encodings and \kcatcode Input vs. internal encoding Input chosen by auto-detection or by -kanji option ((s)jis, euc, utf8) Internal pTEX: fixed 8-bit JIS code upTEX: Unicode Thus pTEX converts UTF-8 before tokenization \kcatcode — Japanese-specific category code ▶ Every multi-byte Japanese character receives a \kcatcode E.g., 15 = Non-CJK, 16 = Kanji, 17 = Kana, 18 = Others ▶ Used both for lexical scanning (e.g., judging control sequences) and for line-break/prohibition class look-ups ▶ The specification differ noticeably between pTEX and upTEX See “Guide to pTEX for developers unfamiliar with Japanese” (ptex-guide-en.pdf) 13 / 21
  14. Recent Status of the TEX Community in Japan Part III:

    Recent Trends in Development and Communication 14 / 21
  15. Recent Status of the TEX Community in Japan Brief History

    of L A TEX2ϵ and Some TEX Engines 1994 LaTeX2e released 1995 TeX Live released 1997 pdfTeX released 2007 upTeX released 2010 LuaTeX released 2015 Reactivation of LaTeX dev 2019 The PDF Tagging 2019 Project starts 2020 Integration of expl3 Peaceful 20 years 15 / 21
  16. Recent Status of the TEX Community in Japan pL A

    TEX Predicament — A Mountain of Technical Debt A giant patch-work Decades of incremental extensions mean every upstream L A TEX change triggers a new cascade of local fixes Rapid L ATEX-kernel evolution (since 2020) ▶ Migration from NFSS 2 to the new NFSS (2020) ▶ Introduction of the new hook system and the plug/socket mechanism (2020–) Japan-specific extensions now need frequent “catch-up patch” cycle Severely limited manpower All Japanese engines and formats are effectively maintained by only one or two active developers each 16 / 21
  17. Recent Status of the TEX Community in Japan Root Cause:

    pTEX’s Incompatible Token Model (u)pTEX = “8-bit engine + special Japanese tokens” ▶ Western characters live in the 0–255 range, while Japanese characters are stored as code points ≥ 256 ▶ Japanese tokens have no \catcode; hence active characters and similar mechanisms cannot be assigned to them Examples of practical breakage ▶ expl3 string modules (l3regex, l3str-convert, etc.) misbehave or fail ▶ \detokenize that mixes 8-bit chars and Japanese tokens yields illegal token lists A radical fix would require rewriting both the specification and the C core implementation of pTEX — realistically infeasible 17 / 21
  18. Recent Status of the TEX Community in Japan It Is

    Not Just an Engine- or Kernel-Level Issue Relatively manageable items ▶ Generic Japanese document classes (e.g., jarticle, jsarticle, jlreq) ▶ Japan-specific L A TEX packages Hard-to-maintain items ▶ Patch collections that adapt foreign packages cf. plautopatch ▶ Journal / society templates and other publisher-specific macros Long tail of legacies makes a clean break from pTEX extremely hard • doc (latex) → pldocverb (platex-tools) • tracefnt (latex) → ptrace/uptrace (platex/uplatex) • fltrace (latex) → pfltrace (platex) • array (latex-tools) → plarray (platex-tools) • array (latex-tools) + plext (platex) → plextarray (platex-tools) • delarray (latex-tools) + plext (platex) → plextdelarray (platex-tools) • colortbl + plext (platex) → plextcolortbl (platex-tools) • arydshln → plarydshln • arydshln + plext (platex) → plextarydshln • siunitx → plsiunitx • collcell → plcollcell • everysel (ms) → pxeverysel (platex-tools) • everyshi (ms) → pxeveryshi (platex-tools) • atbegshi (oberdiek) → pxatbegshi (platex-tools) • ftnright (latex-tools) → pxftnright (platex-tools) • multicol (latex-tools) → pxmulticol (platex-tools) • xspace (latex-tools) → pxxspace (platex-tools) • textpos → pxtextpos (gentombow) • eso-pic → pxesopic (gentombow) • pdfpages → pxpdfpages (gentombow) • stfloats (sttools) → pxstfloats (pxsttools) • hyperref → pxjahyper • pgfrcs (pgf) → pxpgfrcs • pgfcore (pgf) → pxpgfmark 18 / 21
  19. Recent Status of the TEX Community in Japan Community Status

    & Possible Paths Forward Communication channels ▶ Mostly text-based: Japanese TEX Users Slack and GitHub Issues ▶ TEXConf: Annual Japanese TEX Users Conference npTEX concept — a X E TEX-based, legacy-free successor ▶ Specification still undefined, prototyping status for a few years ▶ L A TEX News 40 (Nov 2024) announced an end to full X E TEX support Where I fit in ▶ Not an engine hacker; my focus is documentation and tooling ▶ My efforts to stay engaged with the global TEX community include Texdoc development and translating LearnLaTeX.org ▶ Talks like this aim to bridge the information gap with the global TEX community 19 / 21
  20. Recent Status of the TEX Community in Japan Summary Japanese

    vs. Western typesetting: what really differs Zero inter-word spaces, script mixing, and strict break-prohibition rules demand their own line-break logic, JFM glue, and vertical-writing support Two practical Japanese workflows, two philosophies pTEX variants = fast, engine-level C extensions; LuaTEX-ja = flexible Lua callbacks on a modern UTF-8 core — each with clear strengths and growing maintenance costs The road ahead: limited hands, rising kernel changes One-to-two active developers per engine, legacy token issues, and the still-vague npTeX idea mean collaboration and fresh contributors are urgently needed 20 / 21
  21. Recent Status of the TEX Community in Japan References ▶

    Takuto Asakura, The BXghost Package, Version 0.5.1 (2023). ▶ Japanese TEX Development Community (texjporg). pTEX Manual, Version p4.1.1 (June 2024). ▶ Japanese TEX Development Community (texjporg). Guide to pTEX for developers unfamiliar with Japanese, Version p4.1.1 (June, 2024). ▶ Hironori Kitagawa. Japanese Typesetting with LuaTEX. EuroTEX 2012. https://github.com/h-kitagawa/presentations/blob/HEAD/eurotex12.pdf. ▶ Hironori Kitagawa. About “X E TEX-based npTEX” (in Japanese). https://qiita.com/h-kitagawa/items/ebbf3684b3b1be8f0747 (2023) (accessed on 2025-05-01). ▶ The L A TEX Team. L A TEX News Issue 40 (November 2024). ▶ The LuaTEX-ja Project Team. The LuaTEX-ja Package, Version 20250401.0 (April 2025). ▶ W3C. Requirements for Japanese Text Layout (JLReq). https://www.w3.org/TR/jlreq/ (2020) (accessed on 2025-05-01). ▶ Hironobu Yamashita, pL A TEX may be seriously in danger (in Japanese). https://acetaminophen.hatenablog.com/entry/2021/06/18/022108 (2021) (accessed on 2025-05-01). ▶ Hironobu Yamashita, Package plautopatch, Version 0.9q (2021). 21 / 21