Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Markus Wein
October 02, 2014
Programming
0
130
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
160
A crash intro to deliberate practice
cypher
0
120
Keeping Your PostgreSQL Data Save
cypher
0
140
Ghost in the State Machine
cypher
2
340
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
260
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
250
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
Windows on Ryzen and I
seosoft
0
410
Xdebug と IDE による デバッグ実行の仕組みを見る / Exploring-How-Debugging-Works-with-Xdebug-and-an-IDE
shin1x1
0
230
What Spring Developers Should Know About Jakarta EE
ivargrimstad
0
710
ローカルで稼働するAI エージェントを超えて / beyond-local-ai-agents
gawa
0
150
Nuxt Server Components
wattanx
0
140
Laravel Nightwatchの裏側 - Laravel公式Observabilityツールを支える設計と実装
avosalmon
1
250
Redox OS でのネームスペース管理と chroot の実現
isanethen
0
460
それはエンジニアリングの糧である:AI開発のためにAIのOSSを開発する現場より / It serves as fuel for engineering: insights from the field of developing open-source AI for AI development.
nrslib
1
620
20260320登壇資料
pharct
0
130
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
1.2k
「効かない!」依存性注入(DI)を活用したAPI Platformのエラーハンドリング奮闘記
mkmk884
0
260
Geminiをパートナーに神社DXシステムを個人開発した話(いなめぐDX 開発振り返り)
fujiba
0
120
Featured
See All Featured
Being A Developer After 40
akosma
91
590k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Faster Mobile Websites
deanohume
310
31k
AI in Enterprises - Java and Open Source to the Rescue
ivargrimstad
0
1.2k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
700
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
490
SEO for Brand Visibility & Recognition
aleyda
0
4.4k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
650
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
86
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.5k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is