Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Mule-begotten problems for Emacs and Gnus
- To: tlug@example.com
- Subject: Re: tlug: Mule-begotten problems for Emacs and Gnus
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Fri, 9 Jan 1998 17:31:35 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <199801090655.GAA00847@example.com>
- References: <m0xpm7D-00012bC@example.com><199801090655.GAA00847@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "KM" == Karl-Max Wagner <karlmax@example.com> writes: KM> Stephen Turnbull writes: >> The Unicode issue, as such, is a red herring. Unicode is a >> Western imperialist plot in the minds of many Orientals. >> Europeans are not KM> Imerialist plot ? Hmmm. Imerialist plot appears a bit KM> farfetched to me. Me too. So what? It's not our language. Looks to me from your return address like you would be mighty annoyed if there were no multibyte languages and the Russkis used their position on the UN Security Council to demand that the top half of the 8-bit code space be reserved for Cyrillic.... I'd not hesitate to call that imperialism. It's a stretch to get from Han Unification to "all umlauts to /dev/null," but at lesat it's the same dimension. KM> If it is inadequate it should be pointed out KM> where and proposals how to fix that should be made. Been done. The proposal is a standard, it's called UCS-4 and is written out in ISO-10646. KM> Well, obviously. It IS a technical advantage to have a KM> character set with only very few characters. _Western_ Europe doesn't have one. ISO-8859-1 leaves out the single densest Internet domain in Europe: Iceland. Where's the advantage? And how 'bout them Russkis and Israelis (not quite Europe, but close)? KM> I have to admit that I don't know that problem, but 16 KM> bits yield a character space of 65536 characters. The KM> Chinese use about 10000 Kanji or so, the Japanese 4000 or Every day use, you're right. Classical scholars and corporations use quite a few more. In fact, the Japanese national standard character set (you have to include JIS X 212) alone is more like 16000---and that doesn't include all the corporate character sets AFAIK. KM> So where is the problem? Not my job to count the ways. However, the Chinese National Standard (Taiwan) includes something on the order of 75,000 code points. And the Koreans use both Chinese characters _and_ their own set of 6500 "composed Hangul", look at the Unicode Standard. On top of that, there's a lot of private and reserved space, and suddenly 65535 (0xFFFF is illegal) characters aren't so many. KM> It may well be that the character numbering in such a KM> case lacks a bit of systematics, but this shouldn't be KM> much of a problem: table driven libraries etc. could be KM> made available to hide that fact from the programmer KM> (actually, it's only a problem with sorts. In Japanese, KM> due to ON/KUN reading a dctionary driven sort is required KM> anyway) Mostly not used though. Depends on whether you need dictionary order or efficient searches. >> The real issue is not Unicode qua Unicode; while the Europeans >> in general would love to use Unicode to handle Oriental >> character sets, many Orientals are adamantly opposed. Books >> have been written about KM> But what do they propose instead ? The Tower of Babel. They prefer the current state to one in which their national languages are subordinated to the needs of machines. Do you really want to give up your hard S? Seems to me "ss" functions just as well. KM> This sounds VERY MUCH like a Tower-of-Bable story to me. It is KM> obvious that this is SERIOUSLY hampering global networking and KM> global software development. Not really. This stuff is actually pretty trivial; it was complicated by Stallman's decision to go with a multi-byte rather than a wide-character representation. >> how Unicode will be the demise of the Japanese language, for >> example. ....(Very interesting description of Emacs internals) KM> To put it bluntly: Emacs development is seriously KM> suffering from the fact that there is no global unified KM> encoding scheme in general use by now. lots of time is KM> actually wasted in order to customize the software to the KM> individual encoding schemes, as far as I understand. So? Maybe a lot of them prefer working on this problem to implementing nicer frames or a Word-a-like (which I hear is Stallman's direction of the future for Emacs). I don't have a problem with that. KM> To make things even worse, a glance onto the names of KM> implementers of free software shows that the vast KM> majority of them are of euro-american origin. It is safe Are you reading the Japanese equivalent of comp.*? If not I don't think you have the right to make a call on that. KM> to assume that most of them don't have a background in KM> non european linguistics. Thus their work will always KM> implicitly be western-centered. However, if a universal KM> unified character encoding scheme would be in general KM> use, those people would use that and the problems would KM> vanish for the most part (specific entry methods would KM> still be necessary, but the rest would be the same KM> anywhere). That happens to be a misconception. Look up "kinsoku" in your Japanese-English (or Japanese-Deutsch, if my guess is right about your native language) dictionary. The character set solves only a very small subset of the problems of localized, internationalized, or multilingual programming. KM> It is easy to see that this would solve a LOT of Not to say that a LOT isn't a LOT. Just that a LOT is a small fraction of ALL, and some of the rest is HARD. KM> problems. To maintain that an attempt to create a unified KM> character encoding scheme like Unicode is an KM> "imperialistic plot" appears unfair against the KM> implementers and unconstructive as well to me. It doesn't If the implementers aren't the users, it's imperialism. KM> BTW, the Linux 2.0.x kernels all have Unicode built Which means what? KM> in. So, is the Linux community planning an imperialistic KM> plot against Asians (an interesting question as some of KM> those are Asians in fact....) ????? Yup. It's arguable. However, systems programming is one thing; I don't expect even to be allowed to use spaces in my file names; I think the Japanese and Chinese and Koreans will be happy enough with Unicode for systems programming. But Mule is the ONLY software in which comparative classical Chinese/Japanese scholarship can be written properly. A Unicode "Mule" would defeat this. Internally, Unicode is the way to go for Mule and for systems programming. The former probably won't happen though because the vested interests (Handa and RMS) are against it. --------------------------------------------------------------- Next TLUG Nomikai: 14 January 1998 19:15 Tokyo station Yaesu Chuo ticket gate. Or go directly to Tengu TokyoEkiMae 19:30 Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691 Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station Yaesu Chuo ticket gate. --------------------------------------------------------------- a word from the sponsor: TWICS - Japan's First Public-Access Internet System www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- Follow-Ups:
- Re: tlug: Mule-begotten problems for Emacs and Gnus
- From: Jon Babcock <jon@example.com>
- References:
- tlug: Mule-begotten problems for Emacs and Gnus
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Re: tlug: Mule-begotten problems for Emacs and Gnus
- From: Karl-Max Wagner <karlmax@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: Katakana to Romanji conversion
- Next by Date: Re: tlug: Mule-begotten problems for Emacs and Gnus
- Prev by thread: Re: tlug: Mule-begotten problems for Emacs and Gnus
- Next by thread: Re: tlug: Mule-begotten problems for Emacs and Gnus
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links