Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
the world of characters
Search
orisano
September 13, 2018
8
1.5k
the world of characters
orisano
September 13, 2018
Tweet
Share
More Decks by orisano
See All by orisano
OSS Performance Tuning Tips
orisano
8
6.1k
Docker-Compose & BuildKit
orisano
4
1k
Container Build Talk
orisano
3
2.5k
dockerignore talk
orisano
2
7.3k
Better docker image+
orisano
6
6.4k
Socket.IO Introduction
orisano
0
3.3k
Profiling Go Application
orisano
11
8k
Multi-stage Builds Patterns & Practice
orisano
6
5.2k
better docker image
orisano
22
30k
Featured
See All Featured
Prompt Engineering for Job Search
mfonobong
0
180
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
140
The Cult of Friendly URLs
andyhume
79
6.8k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
59
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
280
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
The Cost Of JavaScript in 2023
addyosmani
55
9.7k
Ethics towards AI in product and experience design
skipperchong
2
210
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Transcript
1จࣈͷੈք @orisano
Έͳ͞Μ จࣈΛ͑ΒΕ·͢ΑͶʁ
a
a => 1
͋
͋ => 1
佛
佛 => 1
None
=> 1
None
=> 1
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 6
Έͳ͞Μ όΠτΛ͑ΒΕ·͔͢ʁ (UTF-8)
a
a => 1
͋
͋ => 3
佛
佛 => 4
None
=> 4
None
=> 18
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 143
͋ͳ͕ͨࢥ͏1จࣈ Ͳ͏͑Δ͖͔ʁ
byteͰ͑ΒΕͳ͍
Unicodeจࣈू߹ จࣈͱ͕ରԠ͢Δ
͋ => 3042
=> 1F914
͜ͷͷ͜ͱΛ ίʔυϙΠϯτ ͱݺͿ
͜ͷίʔυϙΠϯτΛ byteྻͰදݱ͢Δํ๏Λ ΤϯίʔσΟϯάͱ͍͏
UTF-8ͱ͔UTF-16ͱ͔ ΤϯίʔσΟϯάͷҰछ
ͱΓ͋͑ͣ ίʔυϙΠϯτΛ͑Ε ղܾʁ
͍͍͑
=> 1F468 + 200D + 1F469 + 200D + 1F466
࣮ෳͷίʔυϙΠϯτͰ ҰͭͷจࣈʹͳͬͨΓ͢Δ
ਓ͕ؒೝ͍ࣝͯ͠Δ̍จࣈ ॻهૉ(Grapheme cluster) ͱݺΕ͍ͯΔ
Ͳ͏Ε ίʔυϙΠϯτͷྻ͔Β ॻهૉΛऔΓग़ͤΔ͔
ίʔυϙΠϯτ͕ؒ ॻهૉڥքʹͳΔ͔Ͳ͏͔ͷ ݫີͳϧʔϧ͕͋Δ
UAX #29 Unicode Text Segmentation
None
͜ΕΛJSͰ࣮ͯ͠·ͨ͠ github.com/orisano/graphemesplit
ৄ͘͠ UAX #29 Λݟͯ http://unicode.org/reports/tr29/
ݟΒ͵ਓʹʓจࣈͱ ݴΘΕͨͱ͖ʹ ͪΌΜͱ֬ೝ͠Α͏ʂ
1 byte? 1 codepoint? 1 grapheme cluster?