Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: A couple of questions about Unicode




>>>>> "Taro" == Taro Yamamoto <tyamamot@example.com> writes:

    Taro> Jon Babcock wrote:
    >> >> that there is a Japanese book out about how bad unicode is
    >> >> for the Japanese.  Evidently, it was a best seller in Japan.
    >> 
    >> First, does anyone have the title or any bibliographic info on
    >> this book?

    Taro> I found the following book at a book shop today:

    Taro> Title: $B$$$^F|K\8l$,4m$J$$!=J8;z%3!<%I$N8m$C$?9q:]2=(BAuthor: 
    Taro> $BB@example.com!!>;9'(BPublisher: $B4];33X7]?^=q(BISBN: 4895421465

Thank you very much for this information. I'll have a friend send me a
copy. I won't get it or have time to read it before my LJ article is
finished, but I may mention the existence of $B$$$^F|K\8l$,4m$J$$!=J8;z(B
$B%3!<%I$N8m$C$?9q:]2=(B to illustrate the degree to which opposition to
Unicode has been taken in Japan.

Does anyone know the correct reading (not just a guess) of the
author's name? $BB@example.com!!>;9'(B ?

The Unicode issue plays only a small part in my article but, it plays
a big part in my repertoire of current interests.

Thanks again.


    Taro>  On the other hand, I am
    Taro> skeptical about the possibility of defining a
    Taro> contradiction-free character set, however good and rational
    Taro> its unification and categorization method is (one such
    Taro> successful example is the editing of JIS X 0208:1997),
    Taro> because the reality of kanji (its history and its usages in
    Taro> society) looks to be more complicated and self-contradicting
    Taro> than one industry standard can successfully trace.

You are probably right. Whereas I do believe that, difficult though it
would be, such a set could be successfully compiled by a small,
independent team of two or three researchers, precisely because it
would have been developed without the involvement of official or
semi-official representatives from the kanji-using governments, no
consensus to actually use this wonderful, pure set would ever be
achieved, I'm afraid.

The contradiction runs like this: There is the problem of producing a
truly unified version of something like the Unicode Han character
set. But even if this were to be achieved, unless the set were four or
more times the size of the Unicode (Version 2.0) Han sub-set of 20,902
characters, it would still be too small to offer a prototype kanji
reference for every single kanji instance to be found in the world.

In addition to a better unified Han character set, of approximately
the same size as the current Unicode version, that would cover say 99
% of the needs of people using kanji on computers today, there should
also exist a unicode for the < 3000 hemigrams that either alone as
independent graphs, or in combination, form the entire repertoire of
all 50k + kanji. (There are a handful of rare kanji monstrosities that
even then would have to be excluded!) Any kanji could be composed from
one or two (or three or four, depending on whether the list of
hemigrams is really a list of hemigrams, i.e. 'half-graphs' or, a list
of graphemes, i.e., the lowest level, non-reducible elements of the
Chinese script as determined by looking at a specific font or
font-family, in which only about 500 would be needed). Although,
writing the composition software that could access and properly
combine these hemigrams and then render them in a pleasing fashion on
screen and on the PS or PCL printer presents a serious challenge to
developers, the result would be that *any* kanji could be included in
one's text. Use the common 'UniHan' repertoire for most kanji, but
have the hemigrams available for the rest, that's the idea.

    Taro> only fact that we had to separate prototypic "characters"
    Taro> from their "representation" instances, however reasonable
    Taro> and purposeful it was, implies possible self-contradictory
    Taro> nature of the concept of "character set" in computing, since
    Taro> they should inherently inseparable.

I need more time to ponder this.  I don't understand why a character
in the character set should be inseparable from the form of its
representation. It seems that it MUST be separable from a specific
form in order to allow it to move through time and space. But
basically, you are right, of course. Whatever the form is, however
"distorted", it must somehow be recognizable as an instance of the
pattern that is referenced by a specific character in the set. Looking
back at examples of kanji from the oracle bones $B9C9|J8;z(B <koukotsu
moji> and bronzes $B>bE$J8(B <shouteibun>, and later at the Large Seal $BBg(B
$Bd?(B <daiten> and Small Seal $B>.d?(B <shouten> scripts, and then comparing
these with old $BNl=q(B <reisho> and modern $B\4=q(B <kaisho> in current fonts
occurring in Chinese (traditional and simplified), Korean, and
Japanese (old and new) text, great variations in the form do occur,
even though it is one and the same character undergoing those
transformations.  When looking at these 'great variations' it seems as
if the character in the set is not only *that particular form*, *that
particular instance*; it is *more than that alone*; it includes and
transcends *that specific scriptual embodiment $B=qBN(B <shotai>*. But it
IS inseparable from that *pattern*, and in that sense you're right,
Yamamoto-san.

Stephen, I think a bit of some of the issues you've raised about
Unicode may have been addressed in these messages. As for more details
on the hemigram approach to encoding kanji, I must ask for more time
to respond. I have several translation jobs to finish as well as the
LJ article before I can return my attention to this. Actually, my
problem is how to keep my attention away from this and focused on
those other things <g>.

Jon Babcock
jon@example.com

PS Apologies to tlugers to whom this is not a Linux issue. But I
think, given its incorporation into several, perhaps most, new OSs,
that now is a good time to become more familiar with Unicode, both on
the theoretical level (this thread) and on the implementation level
(touched on by Stephen Turnbull, but still to be thoroughly
addressed.)



---------------------------------------------------------------
Next TLUG Nomikai: 14 January 1998 19:15  Tokyo station
Yaesu Chuo ticket gate.  Or go directly to Tengu TokyoEkiMae 19:30
Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691
Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station
Yaesu Chuo ticket gate.
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links