Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
97
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
130
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
110
Ghost in the State Machine
cypher
2
300
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
230
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.1k
How to Become a Better Developer
cypher
1
230
A Very Short Overview of Vagrant
cypher
0
7.8k
Other Decks in Programming
See All in Programming
從零到一:搭建你的第一個 Observability 平台
blueswen
0
220
TypeScript Language Service Plugin で CSS Modules の開発体験を改善する
mizdra
PRO
3
2.4k
當開發遇上包裝:AI 如何讓產品從想法變成商品
clonn
0
2.6k
Spring gRPC で始める gRPC 入門 / Introduction to gRPC with Spring gRPC
mackey0225
0
140
Parallel::Pipesの紹介
skaji
2
870
TypeScript を活かしてデザインシステム MCP を作る / #tskaigi_after_night
izumin5210
4
480
List Unfolding - 'unfold' as the Computational Dual of 'fold', and how 'unfold' relates to 'iterate'"
philipschwarz
PRO
0
140
バランスを見極めよう!実装の意味を明示するための型定義 TSKaigi 2025 Day2 (5/24)
whatasoda
2
780
技術懸念に立ち向かい 法改正を穏便に乗り切った話
pop_cashew
0
840
FastMCPでMCPサーバー/クライアントを構築してみる
ttnyt8701
2
100
がんばりすぎないコーディングルール運用術
tsukakei
1
180
Doma で目指す ORM 最適解
nakamura_to
1
160
Featured
See All Featured
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
123
52k
How to Think Like a Performance Engineer
csswizardry
23
1.6k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
The Invisible Side of Design
smashingmag
299
50k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.8k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Statistics for Hackers
jakevdp
799
220k
How to train your dragon (web standard)
notwaldorf
92
6k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.4k
KATA
mclloyd
29
14k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
42
2.3k
YesSQL, Process and Tooling at Scale
rocio
172
14k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is