Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Re: Unicode (Was: apache2...)



Shimpei Yamashita <shimpei@example.com> wrote:
>> A few questions, from a complete amateur....
>> 
>> On Sat, Jul 12, 2003 at 12:45:28AM +1000,
>> Jim Breen wrote:
>> > Things don't "look" like anything in Unicode. The look comes from the
>> > font. You choose the font. You buy a Chinese-style Unicode font where 
>> > the hanzi look Chinese, or you buy a Japanese-style font. The codes
>> > stay the same.
>> 
>> Does that mean that a multilingual text document, rendered with a single
>> Unicode font, may only "look" correct in one Asian language at a time? 

Depending on the font, yes.

>> If so,
>> does it not mean that Unicode only *pretends* to be context-independent, and
>> actually depends on the user (which could be the application or the human
>> being) to provide that context because it fails to provide a context-
>> presentation mechanism internally?

Not at all. There are language codes in Unicode, and if the document has
been prepared with them, a smart application can do things like
selecting fonts according to them, or invoking spell-checkers according
to the language, or all the other language-dependent things. It's the
same with A,a,B,b, etc. Different European cultures actually have their
preferred fonts and think others look foreign, but no-one has accused
ISO-8859-* of pretense or cultural hegemony on this score.

>> > Be that as it may, EVERY kanji in JIS X 0208 and JIS X 0212 ended up in
>> > Unicode 1.0. What is called the "source separation rule" meant that if 
>> > a kanji/hanzi/hanja pair that would otherwise be unified occurs
>> > multiply in one of the national standards, then it appears multiply in
>> > Unicode. Thus all six version of the "ken" kanji, which blind Freddie
>> > could tell are really the same, are dutifully replicated in Unicode,
>> > because that's the way they are in JIS X 0208.
>> 
>> That doesn't seem to solve the above problem at all, which involves
>> *different* countries using different glyphs for the "same" character.

No, I mentioned that because people still say Unicode is "missing some
kanji",  and "was prepared ignoring national wishes", which  is where
this thread started.

>> Jim, what I don't quite understand is this: exactly what problem is Unicode
>> meant to solve anyway? 

The key problem was the inability of the pre-Unicode codes to mix
languages in a usable way. Have you ever tried to mix Japanese with 
French or German? It was only possible before Unicode by using ISO-2022
escaping which is a truly horrible way to handle text. In the case of
the "CJK" languages it was worse. At least with ordinary alphabetics an
"a" or a "b" tended to be the same regardless of language, but with the 
CJK languages, something like $Bhttp://www.csse.monash.edu.au/~jwb/)
Computer Science & Software Engineering,                Tel: +61 3 9905 3298
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                $B%8%`!&%V%j!<%s(B@$B%b%J%7%eBg3X(B

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links