Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Markus Wein
October 02, 2014
Programming
140
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
180
A crash intro to deliberate practice
cypher
0
140
Keeping Your PostgreSQL Data Save
cypher
0
150
Ghost in the State Machine
cypher
2
350
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
270
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.4k
How to Become a Better Developer
cypher
1
270
A Very Short Overview of Vagrant
cypher
0
8.1k
Other Decks in Programming
See All in Programming
Webフレームワークの ベンチマークについて
yusukebe
0
170
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
710
net-httpのHTTP/2対応について
naruse
0
500
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
260
AIとASP.NET Coreで雑Webアプリを作った話
mayuki
0
660
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
170
正しくソフトウェアを作る、前提を疑うための認知の視点 / doubt-premise
minodriven
21
6.8k
Signal Forms: Details & Live Coding @enterJS 2026 in Mannheim
manfredsteyer
PRO
0
160
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
200
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
110
dRuby over BLE
makicamel
2
380
Lessons from Spec-Driven Development
simas
PRO
0
210
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
From π to Pie charts
rasagy
0
210
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
160
HDC tutorial
michielstock
2
720
Marketing to machines
jonoalderson
1
5.5k
Git: the NoSQL Database
bkeepers
PRO
432
67k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
970
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is