Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
120
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
140
A crash intro to deliberate practice
cypher
0
120
Keeping Your PostgreSQL Data Save
cypher
0
130
Ghost in the State Machine
cypher
2
330
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
250
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
240
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
20251212 AI 時代的 Legacy Code 營救術 2025 WebConf
mouson
0
250
QAフローを最適化し、品質水準を満たしながらリリースまでの期間を最短化する #RSGT2026
shibayu36
2
3.3k
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
340
dchart: charts from deck markup
ajstarks
3
960
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
170
rack-attack gemによるリクエスト制限の失敗と学び
pndcat
0
210
Findy AI+の開発、運用におけるMCP活用事例
starfish719
0
2.2k
Patterns of Patterns
denyspoltorak
0
770
AtCoder Conference 2025「LLM時代のAHC」
imjk
2
670
なぜSQLはAIぽく見えるのか/why does SQL look AI like
florets1
0
290
DevFest Android in Korea 2025 - 개발자 커뮤니티를 통해 얻는 가치
wisemuji
0
180
IFSによる形状設計/デモシーンの魅力 @ 慶應大学SFC
gam0022
0
120
Featured
See All Featured
How to make the Groovebox
asonas
2
1.9k
4 Signs Your Business is Dying
shpigford
187
22k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
83
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.1k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
110
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
0
3.4k
Into the Great Unknown - MozCon
thekraken
40
2.2k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
0
1k
A designer walks into a library…
pauljervisheath
210
24k
Un-Boring Meetings
codingconduct
0
180
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
The Spectacular Lies of Maps
axbom
PRO
1
440
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is