Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Markus Wein
October 02, 2014
Programming
130
0
Share
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
170
A crash intro to deliberate practice
cypher
0
130
Keeping Your PostgreSQL Data Save
cypher
0
150
Ghost in the State Machine
cypher
2
340
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
260
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
8.3k
How to Become a Better Developer
cypher
1
260
A Very Short Overview of Vagrant
cypher
0
8k
Other Decks in Programming
See All in Programming
From Formal Specification to Property Based Test
ohbarye
0
2.4k
How We Practice Exploratory Testing in Iterative Development( #scrumniigata ) / 反復開発の中で、探索的テストをどう実施しているか
teyamagu
PRO
3
780
Terraform言語の静的解析 / static analysis of Terraform language
wata727
1
140
Claude CodeでETLジョブ実行テストを自動化してみた
yoshikikasama
0
1.2k
Liberating Ruby's Parser from Lexer Hacks
ydah
2
2.7k
When benchmarks go bad - what I learned from measuring performance wrong
hollycummins
0
380
PHPでローカル環境用のSSL/TLS証明書を発行することはできるのか? #phpconkagawa
akase244
0
350
継続的な負荷検証を目指して
pyama86
3
1k
cloudnative conference 2026 flyle
azihsoyn
0
160
Spec Driven Development | AI Summit Vilnius
danielsogl
PRO
1
150
Structured Concurrency, Scoped Values and Joiners in the JDK 25 26 27
josepaumard
1
150
ふにゃっとしない名前の付け方 〜哲学で茹で上げる、コシのあるソフトウェア設計〜
shimomura
0
110
Featured
See All Featured
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.9k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
230
Java REST API Framework Comparison - PWX 2021
mraible
34
9.3k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
500
Being A Developer After 40
akosma
91
590k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
110
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.8k
A Soul's Torment
seathinner
6
2.8k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is