Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: A couple of questions about Unicode
- To: tlug@example.com
- Subject: Re: tlug: A couple of questions about Unicode
- From: Jon Babcock <jon@example.com>
- Date: 11 Jan 1998 07:48:50 -0700
- Cc: michael@example.com
- In-Reply-To: Taro Yamamoto's message of Sat, 10 Jan 1998 21:25:33 +0900
- References: <Pine.LNX.3.96LJ1.1b7.980110093817.18865A-100000@example.com> <34B71D4C.1684ACAD@example.com> <86btxk39kw.fsf@example.com> <34B768BD.ADD3D5D3@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "Taro" == Taro Yamamoto <tyamamot@example.com> writes: Taro> Jon Babcock wrote: >> >> that there is a Japanese book out about how bad unicode is >> >> for the Japanese. Evidently, it was a best seller in Japan. >> >> First, does anyone have the title or any bibliographic info on >> this book? Taro> I found the following book at a book shop today: Taro> Title: $B$$$^F|K\8l$,4m$J$$!=J8;z%3!<%I$N8m$C$?9q:]2=(BAuthor: Taro> $BB@example.com!!>;9'(BPublisher: $B4];33X7]?^=q(BISBN: 4895421465 Thank you very much for this information. I'll have a friend send me a copy. I won't get it or have time to read it before my LJ article is finished, but I may mention the existence of $B$$$^F|K\8l$,4m$J$$!=J8;z(B $B%3!<%I$N8m$C$?9q:]2=(B to illustrate the degree to which opposition to Unicode has been taken in Japan. Does anyone know the correct reading (not just a guess) of the author's name? $BB@example.com!!>;9'(B ? The Unicode issue plays only a small part in my article but, it plays a big part in my repertoire of current interests. Thanks again. Taro> On the other hand, I am Taro> skeptical about the possibility of defining a Taro> contradiction-free character set, however good and rational Taro> its unification and categorization method is (one such Taro> successful example is the editing of JIS X 0208:1997), Taro> because the reality of kanji (its history and its usages in Taro> society) looks to be more complicated and self-contradicting Taro> than one industry standard can successfully trace. You are probably right. Whereas I do believe that, difficult though it would be, such a set could be successfully compiled by a small, independent team of two or three researchers, precisely because it would have been developed without the involvement of official or semi-official representatives from the kanji-using governments, no consensus to actually use this wonderful, pure set would ever be achieved, I'm afraid. The contradiction runs like this: There is the problem of producing a truly unified version of something like the Unicode Han character set. But even if this were to be achieved, unless the set were four or more times the size of the Unicode (Version 2.0) Han sub-set of 20,902 characters, it would still be too small to offer a prototype kanji reference for every single kanji instance to be found in the world. In addition to a better unified Han character set, of approximately the same size as the current Unicode version, that would cover say 99 % of the needs of people using kanji on computers today, there should also exist a unicode for the < 3000 hemigrams that either alone as independent graphs, or in combination, form the entire repertoire of all 50k + kanji. (There are a handful of rare kanji monstrosities that even then would have to be excluded!) Any kanji could be composed from one or two (or three or four, depending on whether the list of hemigrams is really a list of hemigrams, i.e. 'half-graphs' or, a list of graphemes, i.e., the lowest level, non-reducible elements of the Chinese script as determined by looking at a specific font or font-family, in which only about 500 would be needed). Although, writing the composition software that could access and properly combine these hemigrams and then render them in a pleasing fashion on screen and on the PS or PCL printer presents a serious challenge to developers, the result would be that *any* kanji could be included in one's text. Use the common 'UniHan' repertoire for most kanji, but have the hemigrams available for the rest, that's the idea. Taro> only fact that we had to separate prototypic "characters" Taro> from their "representation" instances, however reasonable Taro> and purposeful it was, implies possible self-contradictory Taro> nature of the concept of "character set" in computing, since Taro> they should inherently inseparable. I need more time to ponder this. I don't understand why a character in the character set should be inseparable from the form of its representation. It seems that it MUST be separable from a specific form in order to allow it to move through time and space. But basically, you are right, of course. Whatever the form is, however "distorted", it must somehow be recognizable as an instance of the pattern that is referenced by a specific character in the set. Looking back at examples of kanji from the oracle bones $B9C9|J8;z(B <koukotsu moji> and bronzes $B>bE$J8(B <shouteibun>, and later at the Large Seal $BBg(B $Bd?(B <daiten> and Small Seal $B>.d?(B <shouten> scripts, and then comparing these with old $BNl=q(B <reisho> and modern $B\4=q(B <kaisho> in current fonts occurring in Chinese (traditional and simplified), Korean, and Japanese (old and new) text, great variations in the form do occur, even though it is one and the same character undergoing those transformations. When looking at these 'great variations' it seems as if the character in the set is not only *that particular form*, *that particular instance*; it is *more than that alone*; it includes and transcends *that specific scriptual embodiment $B=qBN(B <shotai>*. But it IS inseparable from that *pattern*, and in that sense you're right, Yamamoto-san. Stephen, I think a bit of some of the issues you've raised about Unicode may have been addressed in these messages. As for more details on the hemigram approach to encoding kanji, I must ask for more time to respond. I have several translation jobs to finish as well as the LJ article before I can return my attention to this. Actually, my problem is how to keep my attention away from this and focused on those other things <g>. Jon Babcock jon@example.com PS Apologies to tlugers to whom this is not a Linux issue. But I think, given its incorporation into several, perhaps most, new OSs, that now is a good time to become more familiar with Unicode, both on the theoretical level (this thread) and on the implementation level (touched on by Stephen Turnbull, but still to be thoroughly addressed.) --------------------------------------------------------------- Next TLUG Nomikai: 14 January 1998 19:15 Tokyo station Yaesu Chuo ticket gate. Or go directly to Tengu TokyoEkiMae 19:30 Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691 Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station Yaesu Chuo ticket gate. --------------------------------------------------------------- a word from the sponsor: TWICS - Japan's First Public-Access Internet System www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- References:
- Re: tlug: A couple of questions about Unicode
- From: Craig Oda <craig@example.com>
- Re: tlug: A couple of questions about Unicode
- From: Taro Yamamoto <tyamamot@example.com>
- Re: tlug: A couple of questions about Unicode
- From: Jon Babcock <jon@example.com>
- Re: tlug: A couple of questions about Unicode
- From: Taro Yamamoto <tyamamot@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: about Osaka
- Next by Date: Re: tlug: A couple of questions about Unicode
- Prev by thread: Re: tlug: A couple of questions about Unicode
- Next by thread: Re: tlug: A couple of questions about Unicode
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links