Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Unicode, JavaScript and the Emoji family
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
stefan judis
November 07, 2016
Technology
2.6k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Unicode, JavaScript and the Emoji family
stefan judis
November 07, 2016
More Decks by stefan judis
See All by stefan judis
Back to boring (part 2)
stefanjudis
0
360
Playwright can do this?
stefanjudis
0
240
Things you should know about Frontend Development in 2022
stefanjudis
0
550
Throw yourself out there for fun and profit
stefanjudis
0
140
Back to Boring
stefanjudis
1
530
Wanna scale up? Make sure your CMS is ready for it!
stefanjudis
0
280
Did we(b development) lose the right direction?
stefanjudis
6
2.2k
Regular expressions – my secret love
stefanjudis
1
1.1k
Write a Function
stefanjudis
0
620
Other Decks in Technology
See All in Technology
Snowflakeと仲良くなる第一歩
coco_se
4
360
2026 TECHFRESH 畢業分享會 - 開發日常大解密!從領域驅動到企業級上線
line_developers_tw
PRO
0
520
AmazonRoute 53ではじめてのドメイン取得!HTTPS化までの道のりを整理してみた
usanchuu
3
120
2026.06.13_AI時代に事業会社が「SIer出身エンジニア」を求める理由 / Why Businesses Seek Engineers with a System Integrator Background in the AI Era
jumtech
0
1k
2026TECHFRESH畢業分享會 - Lightning Talk - 資料也要 CI/CD? 用 Airbyte 自動化資料同步
line_developers_tw
PRO
0
520
就職⽀援サービスにおけるキャリアアドバイザーのシフトスケジューリング
recruitengineers
PRO
1
120
AGENTS.mdとSkillsで始めるAIエージェント活用
sonoda_mj
2
170
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development with AI-DLC
yoshidashingo
0
160
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
3
180
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
6
1.5k
チームで進めるAI駆動アジャイル×ウォーターフォール
kumaiu
0
140
製造業のクラウド活用最適解〜AI,DXを加速するデータ基盤の作り方〜
hamadakoji
0
430
Featured
See All Featured
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
580
The untapped power of vector embeddings
frankvandijk
2
1.8k
Balancing Empowerment & Direction
lara
6
1.2k
Design in an AI World
tapps
1
230
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
410
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
210
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
65
56k
Between Models and Reality
mayunak
4
330
Designing for Timeless Needs
cassininazir
1
250
Producing Creativity
orderedlist
PRO
348
40k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Prompt Engineering for Job Search
mfonobong
0
340
Transcript
[...''] = ['', '', '', '', ''] Unicode, JavaScript and
the Emoji family @stefanjudis
Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open
Source, Performance and Accessibility ❤ @stefanjudis
cssclass.es
Stefan Judis Frontend Developer, Occasional Teacher, Meetup Organizer ❤ Open
Source, Performance and Accessibility ❤ @stefanjudis
''.length
''.length 2
'%'.length
'%'.length 4
''.length
''.length 8
[...'']
[...''] ['', '', '', '', ''] length = 5
Okay! What's going on here?
It's all about Unicode
UNICODE ... is an international encoding standard 01 02 03
is a mapping from each letter, digit or symbol to a numeric value works across different platforms and programs
U+0000 to U+10FFFF 1,114,112 code points usually formatted as hexadecimal
numbers from UNICODE - overview -
1,114,112 code points in 17 planes Basic Multilingual Plane U+0000
to U+FFFF Supplementary Planes u+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - overview -
characters for almost all modern languages + a lot of
of symbols Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Basic Multilingual Plane -
everything else Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes
U+10000 to U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 plane 1 plane 1 plane 1 plane 2 planes 16 planes 11 planes UNICODE - Supplementary Planes -
Emojis
EMOJIS ... were initially used by Japanese mobile operators 01
02 03 were added to Unicode v6 in October 2010 are supported since OS X 10.7 (Lion) and Windows 8
Basic Multilingual Plane U+0000 to U+FFFF Supplementary Planes U+10000 to
U+10FFFF U+10000 to U+1FFFF U+20000 to U+2FFFF U+30000 to U+DFFFF U+E0000 to U+EFFFF U+F0000 to U+10FFFF Supplementary Multilingual Plane Supplementary Ideographic Plane Supplementary Special-purpose Plane Supplementary Private Use Area Planes unassigned 1 Plane 1 Plane 1 Plane 1 Plane 2 Planes 16 Planes 11 Planes %' are in the Supplementary Multilingual Plane EMOJIS - overview -
How many Emojis are out there? EMOJIS - overview -
How many Emojis are out there? EMOJIS - overview -
It depends how you count.
Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE
U+1F3FF
Modifier Sequences Five modifiers for diversity U+1F3FB U+1F3FC U+1F3FD U+1F3FE
U+1F3FF ) = + U+1F3FD U+1F466 ( 2 code points )
EMOJIS ZERO WIDTH JOINER U+200D Indicator that a single glyph
should be presented for a sequence of characters - ZWJ sequences -
EMOJIS U+1F46A - ZWJ sequences - ( 1 code point
)
EMOJIS * - ZWJ sequences -
EMOJIS - ZWJ sequences - * U+1F468 + ZWJ U+200D
+ U+1F468 U+1F467 + ZWJ U+200D + ( 5 code points )
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + /
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" - Singer - ZWJ + + Apple Google ZWJ + +
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / "David Bowie" Emoji is not yet supported.
EMOJIS - ZWJ sequences - woman astronaut ( 4 code
points ) ZWJ + + man artist ( 4 code points ) ZWJ + + man getting hair cut ( 4 code points ) ♂ ZWJ + + - woman mountain biking ( 4 code points ) ♀ ZWJ + + / Sequences degrade gracefully! '\u{1F468}\u{200D}\u{1F3A4}' "" '\u{1F469}\u{200D}\u{1F3A4}' ""
EMOJIS - flags - ... 26 regional indicators used in
pairs to represent regions U+1F1E6 U+1F1FF
EMOJIS - flags - ... 26 regional indicators used in
pairs to represent regions U+1F1E6 U+1F1FF 7 U+1F1E9 U+1F1EA : U+1F1EC U+1F1E7 < U+1F1E8 U+1F1FD ( 2 code points ) ( 2 code points ) ( 2 code points )
EMOJIS - flags - www.dwitter.net/d/2708 function() { x.font='96px a' S=String.fromCodePoint
W=e=>x.measureText(e).width i=t*4%257|0 W(S(F=0x1F1E6,F))>W(_=S(F+i%26,F+i/26|0))&&x.fillText(_,9,99) } Dweet by @veubeke
How many Emojis are out there? EMOJIS - overview -
2198 unicode.org/reports/tr51/#Identification (excluding incomplete singletons) (excluding duplicates) (including all combined sequences)
39 What about Unicode in JavaScript
JAVASCRIPT UTF-16, the string format used by JavaScript, uses a
single 16-bit code unit to represent the most common characters. - string representation -
16-bit code unit 65536 code points JAVASCRIPT - string representation
-
\u0000 - \uFFFF can fit into 16bit ツ ('\uFF82')
('\uF8FF') ‚ ('\u9731') ⛷ ('\u26F7') JAVASCRIPT - characters with one code unit -
\u0000 - \uFFFF can fit into 16bit 'ツ'.length ''.length '‚'.length
'⛷'.length 1 JAVASCRIPT - characters with one code unit -
How can we use code points out of the 16bit
range? JAVASCRIPT - surrogate pairs -
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates U+D800 to U+DBFF
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF
Surrogate Pairs JAVASCRIPT - surrogate pairs - 2048 surrogate code
points included in the Basic Multilingual Plane Leading/High Surrogates Trailing/Low Surrogates U+D800 to U+DBFF U+DC00 to U+DFFF C = (H - 0xD800) * 0x400 + L - 0xDC00 + 0x10000 Formula to get code point C = (H - 55296) * 1024 + L - 56320 + 65536
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.length // 2
U+1F468 128104
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
U+1F468 128104 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
Surrogate Pairs JAVASCRIPT - surrogate pairs - ''.charCodeAt(0) U+D83D 55357
''.charCodeAt(1) U+DC68 56424 U+1F468 128104 0x1F468 = (0xD83D - 0xD800) * 0x400 + 0xDC68 - 0xDC00 + 0x10000 128104 = (55357 - 55296) * 1024 + 56424 - 56320 + 65536 ''.length // 2
charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104
''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
charCodeAt() vs codePointAt() JAVASCRIPT - surrogate pairs - U+1F468 128104
''.codePointAt(0) U+1F468 128104 ''.codePointAt(1) U+DC68 56424 ''.charCodeAt(0) U+D83D 55357 ''.charCodeAt(1) U+DC68 56424
JAVASCRIPT - surrogate pairs - U+1F468 128104 '\uD83D\uDC68' simple Unicode
escapes Unicode code point escapes '\u{1F468}'
57 Okay, what's the deal?
JAVASCRIPT - String.prototype.length - This property returns the number of
code units in the string. String.prototype.length
- the spread operator - The spread operator works for
every iterable object. [...'ABC'] JAVASCRIPT
- the spread operator - The spread operator works for
every iterable object. [...'ABC'] JAVASCRIPT > ''[Symbol.iterator] function [Symbol.iterator]() { [native code] }
- the spread operator - [...] iterates over the code
points of a String value, returning each code point as a String value. String.prototype [ @@iterator ]( ) JAVASCRIPT
62 Let's go back to the examples
''.length 2 1 code point but 2 code units (surrogate
pair)
'%'.length 4 2 code points but 4 code units (2
surrogate pairs) +
''.length 8 5 code points but 8 code units (3
surrogate pairs) ZWJ ZWJ
[...''] ['', '', '', '', ''] U+200D (ZWJ) U+1F468 U+1F469
U+1F466 U+200D (ZWJ)
Thanks! @stefanjudis Slides ctfl.io/javascript-emoji-family Article ctfl.io/emoji-prototype-dot-length