Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
110
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
130
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
120
Ghost in the State Machine
cypher
2
310
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
240
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.1k
How to Become a Better Developer
cypher
1
230
A Very Short Overview of Vagrant
cypher
0
7.9k
Other Decks in Programming
See All in Programming
あのころの iPod を どうにか再生させたい
orumin
2
2.5k
kiroでゲームを作ってみた
iriikeita
0
170
Comparing decimals in Swift Testing
417_72ki
0
170
サイトを作ったらNFCタグキーホルダーを爆速で作れ!
yuukis
0
370
WebAssemblyインタプリタを書く ~Component Modelを添えて~
ruccho
1
860
新世界の理解
koriym
0
140
STUNMESH-go: Wireguard NAT穿隧工具的源起與介紹
tjjh89017
0
380
マイコンでもRustのtestがしたい その2/KernelVM Tokyo 18
tnishinaga
2
2.3k
コーディングは技術者(エンジニア)の嗜みでして / Learning the System Development Mindset from Rock Lady
mackey0225
2
520
Honoアップデート 2025年夏
yusukebe
1
770
「リーダーは意思決定する人」って本当?~ 学びを現場で活かす、リーダー4ヶ月目の試行錯誤 ~
marina1017
0
230
AI時代のドメイン駆動設計-DDD実践におけるAI活用のあり方 / ddd-in-ai-era
minodriven
21
8.2k
Featured
See All Featured
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.6k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.4k
Typedesign – Prime Four
hannesfritz
42
2.8k
Faster Mobile Websites
deanohume
309
31k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.6k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Building an army of robots
kneath
306
45k
Why Our Code Smells
bkeepers
PRO
338
57k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
1k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is