Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
In the beginning was TXT
Search
Markus Wein
October 02, 2014
Programming
0
74
In the beginning was TXT
A very short overview of the history of encodings, given at Vienna.rb on 2014-10-02
Markus Wein
October 02, 2014
Tweet
Share
More Decks by Markus Wein
See All by Markus Wein
Command Line Productivity
cypher
1
110
A crash intro to deliberate practice
cypher
0
110
Keeping Your PostgreSQL Data Save
cypher
0
91
Ghost in the State Machine
cypher
2
280
n Things You Didn't Know About PostgreSQL (Rubyslava & PyVo 2014 Edition)
cypher
1
210
How to Become a Better Developer
cypher
2
1.8k
An Introduction to Rust
cypher
1
7.9k
How to Become a Better Developer
cypher
1
220
A Very Short Overview of Vagrant
cypher
0
7.7k
Other Decks in Programming
See All in Programming
外部システム連携先が10を超えるシステムでのアーキテクチャ設計・実装事例
kiwasaki
1
230
ピラミッド、アイスクリームコーン、SMURF: 自動テストの最適バランスを求めて / Pyramid Ice-Cream-Cone and SMURF
twada
PRO
9
1k
生成 AI を活用した toitta 切片分類機能の裏側 / Inside toitta's AI-Based Factoid Clustering
pokutuna
0
620
飲食業界向けマルチプロダクトを実現させる開発体制とリアルな現状
hiroya0601
1
400
WEBエンジニア向けAI活用入門
sutetotanuki
0
300
Snowflake x dbtで作るセキュアでアジャイルなデータ基盤
tsoshiro
2
440
PagerDuty を軸にした On-Call 構築と運用課題の解決 / PagerDuty Japan Community Meetup 4
horimislime
1
110
現場で役立つモデリング 超入門
masuda220
PRO
13
3k
RailsのPull requestsのレビューの時に私が考えていること
yahonda
5
1.9k
弊社の「意識チョット低いアーキテクチャ」10選
texmeijin
5
23k
讓數據說話:用 Python、Prometheus 和 Grafana 講故事
eddie
0
350
Sidekiqで実現する 長時間非同期処理の中断と再開 / Pausing and Resuming Long-Running Asynchronous Jobs with Sidekiq
hypermkt
6
2.7k
Featured
See All Featured
Speed Design
sergeychernyshev
24
570
Building Your Own Lightsaber
phodgson
102
6.1k
10 Git Anti Patterns You Should be Aware of
lemiorhan
654
59k
Building Applications with DynamoDB
mza
90
6.1k
Designing on Purpose - Digital PM Summit 2013
jponch
115
6.9k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
328
21k
How GitHub (no longer) Works
holman
311
140k
Art, The Web, and Tiny UX
lynnandtonic
296
20k
Navigating Team Friction
lara
183
14k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
6.9k
GraphQLの誤解/rethinking-graphql
sonatard
66
10k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
Transcript
In the beginning was TXT
!
EBCDIC
Source: http://en.wikipedia.org/wiki/EBCDIC
ASCII
"#$%&
None
ä, ö, or å and Ø?
Latin-1 ISO/IEC 8859-1
Latin-*
Windows code pages
Then came the €
(
None
Shift-JIS
This sucks
Unicode!
Unicode!
✈️ (planes!)
Basic Multilingual Plane
Code Points
U+0041 (LATIN SMALL LETTER A)
Source: http://codepoints.net/U+0041
Grapheme
a a a a a a a
Composite characters
U+0065 U+0301 or U+00E9
e+´ => é é
´ != ´
Unicode… is not an encoding
UTF-32
UCS-2/UTF-16
UTF-8
Source: http://en.wikipedia.org/wiki/File:UnicodeGrow2b.png
What does it look like?
Codepoint Char ASCII Latin-1 ISO-8859-15 UTF-8 UTF-16 U+0041 A 0x41
0x41 0x41 0x41 0x00 0x41 U+00C4 Ä - 0xc4 0xc4 0xc3 0x84 0x00 0xc4 U+20AC € - - 0xa4 0xe3 0x82 0xac 0x20 0xac U+C218 ࣻ - - - 0xec 0x88 0x98 0xc2 0x18 Encoding comparison Source: http://perlgeek.de/en/article/encodings-and-unicode
Remember: Just because someone claims it’s UTF-8, doesn’t mean it
is