Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
- Date: Sat, 20 Jan 2007 13:31:23 +0900
- From: "Guillaume Proux" <gproux@example.com>
- Subject: Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
- References: <45AAFDA9.90504@example.com> <19dd68ba0701160122i1b813c10jf34c0210d53fbbdd@example.com> <op.tl8roo02rtshzt@example.com> <19dd68ba0701160412y2eb95062r6235fed92b752784@example.com> <Pine.NEB.4.64.0701162139360.10912@example.com> <3156339d0701161820lb684aeubcd51914b19a87bf@example.com> <Pine.NEB.4.64.0701171657080.1515@example.com> <3156339d0701180035k2a4f2b70o3bbf00612501470@example.com> <Pine.NEB.4.64.0701201123230.1314@example.com> <20070119230346.6435923f.jep200404@example.com>
> In UTF-8, all characters contain exactly one byte without the high bit set.
uh?
The wikipedia page that was linked to shows one example.
""" For example, the character aleph (×), which is Unicode U+05D0, is encoded into UTF-8 in this way:
* It falls into the range of U+0080 to U+07FF. The table shows it will be encoded using two bytes, 110yyyyy 10zzzzzz. * Hexadecimal 0x05D0 is equivalent to binary 101-1101-0000. * The eleven bits are put in their order into the positions marked by "y"-s and "z"-s: 11010111 10010000. * The final result is the two bytes, more conveniently expressed as the two hexadecimal bytes 0xD7 0x90. That is the encoding of the character aleph (×) in UTF-8. """
U+05D0 codepoint is turned into 11010111 10010000 . Both byte having the high bit set.
I am misunderstanding something or can we check this again?
Guillaume
- Follow-Ups:
- References:
- [tlug] What is the most appropriate scripting language
- From: Dave M G
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Guillaume Proux
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Zev Blut
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Guillaume Proux
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Curt Sampson
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Ian MacLean
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Curt Sampson
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Ian MacLean
- Re: Learn a Variety of Languages . . . . . . . (was: Re: [tlug] Re: Bourne Shell is the most appropriate scripting language)
- From: Curt Sampson
- UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
- From: Jim
Home | Main Index | Thread Index
- Prev by Date: Advantage of Having or Not Having Header Files . . . . . . . (was: Re: To package or not to package) [tlug]
- Next by Date: Re: Advantage of Having or Not Having Header Files . . . . . . . (was: Re: To package or not to package) [tlug]
- Previous by thread: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
- Next by thread: Re: UTF-8: each character is one byte . . . . . . (was: Re: Learn a Variety of Languages) [tlug]
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links