Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
the world of characters
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
orisano
September 13, 2018
1.5k
8
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
the world of characters
orisano
September 13, 2018
More Decks by orisano
See All by orisano
OSS Performance Tuning Tips
orisano
8
6.2k
Docker-Compose & BuildKit
orisano
4
1.1k
Container Build Talk
orisano
3
2.6k
dockerignore talk
orisano
2
7.3k
Better docker image+
orisano
6
6.6k
Socket.IO Introduction
orisano
0
3.3k
Profiling Go Application
orisano
11
8.1k
Multi-stage Builds Patterns & Practice
orisano
6
5.3k
better docker image
orisano
22
31k
Featured
See All Featured
Done Done
chrislema
186
16k
Tell your own story through comics
letsgokoyo
1
960
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Six Lessons from altMBA
skipperchong
29
4.3k
How to Talk to Developers About Accessibility
jct
2
240
Optimizing for Happiness
mojombo
378
71k
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
sira's awesome portfolio website redesign presentation
elsirapls
0
280
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Bash Introduction
62gerente
615
220k
Designing for Timeless Needs
cassininazir
1
260
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
240
Transcript
1จࣈͷੈք @orisano
Έͳ͞Μ จࣈΛ͑ΒΕ·͢ΑͶʁ
a
a => 1
͋
͋ => 1
佛
佛 => 1
None
=> 1
None
=> 1
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 6
Έͳ͞Μ όΠτΛ͑ΒΕ·͔͢ʁ (UTF-8)
a
a => 1
͋
͋ => 3
佛
佛 => 4
None
=> 4
None
=> 18
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞
Z͑ͫ̓ͪ̂ͫ̽ ̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ ͫ͗ ͢ L̠ͨͧͩ͘ G̴̻͈͍͔̹ ̑͗̎̅͛ ́ Ǫ̵̹̻̝̳ ͂̌
̌͘! ͖̬̰̙̗ ̿̋ ͥ ͥ̂ͣ̐́́͜͞ => 143
͋ͳ͕ͨࢥ͏1จࣈ Ͳ͏͑Δ͖͔ʁ
byteͰ͑ΒΕͳ͍
Unicodeจࣈू߹ จࣈͱ͕ରԠ͢Δ
͋ => 3042
=> 1F914
͜ͷͷ͜ͱΛ ίʔυϙΠϯτ ͱݺͿ
͜ͷίʔυϙΠϯτΛ byteྻͰදݱ͢Δํ๏Λ ΤϯίʔσΟϯάͱ͍͏
UTF-8ͱ͔UTF-16ͱ͔ ΤϯίʔσΟϯάͷҰछ
ͱΓ͋͑ͣ ίʔυϙΠϯτΛ͑Ε ղܾʁ
͍͍͑
=> 1F468 + 200D + 1F469 + 200D + 1F466
࣮ෳͷίʔυϙΠϯτͰ ҰͭͷจࣈʹͳͬͨΓ͢Δ
ਓ͕ؒೝ͍ࣝͯ͠Δ̍จࣈ ॻهૉ(Grapheme cluster) ͱݺΕ͍ͯΔ
Ͳ͏Ε ίʔυϙΠϯτͷྻ͔Β ॻهૉΛऔΓग़ͤΔ͔
ίʔυϙΠϯτ͕ؒ ॻهૉڥքʹͳΔ͔Ͳ͏͔ͷ ݫີͳϧʔϧ͕͋Δ
UAX #29 Unicode Text Segmentation
None
͜ΕΛJSͰ࣮ͯ͠·ͨ͠ github.com/orisano/graphemesplit
ৄ͘͠ UAX #29 Λݟͯ http://unicode.org/reports/tr29/
ݟΒ͵ਓʹʓจࣈͱ ݴΘΕͨͱ͖ʹ ͪΌΜͱ֬ೝ͠Α͏ʂ
1 byte? 1 codepoint? 1 grapheme cluster?