Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Unicode



I'll try and pick up several sets of comments.

simon colston <simon@example.com> wrote:

>> I have to agree with this.  If you want to create a Japanese-Chinese
>> dictionary in Unicode then there need to be separate codes for each
>> character that looks the same to a Westerner's eyes but very different to
>> the Japanese and Chinese.  

No you don't. There are ways of flagging language, either in markup
(which is how I'd prefer it), or by embedded language-codes. There is
no need to have different code-points for what are very minor glyph
differences. In any case, the characters that actually differ in glyph
are a very small proportion, and involve things like whether in
characters like $BF#(B the kusakanmuri covers the whole character or
is slid a fraction to the right. The differences between Courier and
Helvetica are more substantial than this.

>> I think a lack of sensitivity to these types of
>> problem are a bigger problem than a "nationalistic" desire to have one's
>> own language look like one's own language.

There is no "lack of sensitivity". That argument is quite bogus. The
people who claim that for Unicode are either under the misapprehension
that Unicode mandates glyphs (it doesn't), or are aware that it doesn't
and are saying it on the FUD principle.

Charles Muller <acmuller@example.com> wrote:
 
>> As far as I can tell, it is because the grievances are largely based on
>> misunderstandings of what Unicode is supposed to do. Almost all of the
>> grievances that I have heard from anti-Unicode people have been quibbles
>> about small, idiosyncratic differences in glyph representation, which can
>> very easily be handled at the level of font, and thus there is no problem
>> assigning a single code point.

This is it in a nutshell. Unfortunately the first edition of Unicode
was published using Chinese fonts for some of the more obscure
characters. That set the xenophobes running with a "foisting foreign
characters on us" argument. The JSA committee did the right thing but
putting multiple glyphs in JIS X 0221, but you still here the "Unicode
looks Chinese" argument, which is a total furphy.

>> There are of course a very small percentage of _bimyou_ cases where
>> expert-level debate needs to take place to determine whether or not a
>> character is a variant of another (and if so, what kind of variant). But the
>> fact that more of these did not get hashed out at the early stages is again,
>> from what I understand, due more to the problems of non-cooperation rather
>> than unawareness or arbitrary forcing on the part of the Unicode consortium.
>> 
>> The other thing that I would like to stress is that from the early days up
>> to the present, the Unicode consortium has been quite open to suggestions
>> and reasonable proposals set forth by properly accredited groups and
>> individuals, and therefore the Unicode character set continues to grow and
>> be refined.

Quite. I strongly recommend that people track down a copy of the
overview of Han unification in the Unicode documents. 

Shimpei Yamashita <shimpei@example.com> wrote:
 
>> But that, and combining kanji glyphs, seem to be orthogonal problems to me.
>> In different CJK nations, they don't necessarily look the same, they aren't
>> read the same, and they don't even always mean the same. 

In which case they won't have been unified. Please read up on the
unification process. It was done very carefully. There is a "semantic
axis" which had to be satisfied as well as shape before unification took
place.

>> If all you wanted to
>> do was to create a coding standard in which no two languages ever clashed with
>> each other, you could have given each language's glyphs different coding
>> points. 

Yes, a French "A" and a German "A" and an English "A".

What do you mean by "no two languages ever clashed"? The process was
about codesets; not languages.

>> So why was this not done? I'm sure there were good rationales behind
>> it--coding point economy? ease of lookup?--but it doesn't lead automatically
>> from Unicode's goal as you stated it.

I really don't understand the point you are trying to make. Are you
saying because $Bhttp://www.csse.monash.edu.au/~jwb/)
Computer Science & Software Engineering,                Tel: +61 3 9905 3298
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                $B%8%`!&%V%j!<%s(B@$B%b%J%7%eBg3X(B

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links