all possible characters into a “code point” • An Unicode string is a sequence of code points A U+0041 Glyph Code point Latin capital letter A Character
translate an Unicode string to bytes is called encoding. The same way we encode/encode letters into sound the Unicode encodings allow us to encode/decode into bytes
False u'"\u6c49\u8bed"' u”汉语".encode("utf-8") True '"\\u6c49\\u8bed"’ u”汉语".encode("utf-8") False '"\xe6\xb1\x89\xe8\xaf\xad"' [u"汉语", u"汉语".encode("utf-8")] True '["\\u6c49\\u8bed", "\\u6c49\\u8bed"]' [u"汉语", u"汉语".encode("utf-8")] False UnicodeDecodeError There is no binary in the JSON format!