Re: tlug: Mule-begotten problems for Emacs and Gnus

To: tlug@example.com
Subject: Re: tlug: Mule-begotten problems for Emacs and Gnus
From: "Stephen J. Turnbull" <turnbull@example.com>
Date: Fri, 9 Jan 1998 17:31:35 +0900 (JST)
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <199801090655.GAA00847@example.com>
References: <m0xpm7D-00012bC@example.com><199801090655.GAA00847@example.com>
Reply-To: tlug@example.com
Sender: owner-tlug@example.com

>>>>> "KM" == Karl-Max Wagner <karlmax@example.com> writes:

    KM> Stephen Turnbull writes:
    >> The Unicode issue, as such, is a red herring.  Unicode is a
    >> Western imperialist plot in the minds of many Orientals.
    >> Europeans are not
    KM> Imerialist plot ? Hmmm. Imerialist plot appears a bit
    KM> farfetched to me.

Me too.  So what?  It's not our language.  Looks to me from your
return address like you would be mighty annoyed if there were no
multibyte languages and the Russkis used their position on the UN
Security Council to demand that the top half of the 8-bit code space
be reserved for Cyrillic....  I'd not hesitate to call that
imperialism.  It's a stretch to get from Han Unification to "all
umlauts to /dev/null," but at lesat it's the same dimension.

    KM> If it is inadequate it should be pointed out
    KM> where and proposals how to fix that should be made.

Been done.  The proposal is a standard, it's called UCS-4 and is
written out in ISO-10646.

    KM> Well, obviously. It IS a technical advantage to have a
    KM> character set with only very few characters.

_Western_ Europe doesn't have one.  ISO-8859-1 leaves out the single
densest Internet domain in Europe: Iceland.  Where's the advantage?
And how 'bout them Russkis and Israelis (not quite Europe, but close)?

    KM> I have to admit that I don't know that problem, but 16
    KM> bits yield a character space of 65536 characters. The
    KM> Chinese use about 10000 Kanji or so, the Japanese 4000 or

Every day use, you're right.  Classical scholars and corporations use
quite a few more.  In fact, the Japanese national standard character
set (you have to include JIS X 212) alone is more like 16000---and
that doesn't include all the corporate character sets AFAIK.

    KM> So where is the problem?

Not my job to count the ways.  However, the Chinese National Standard
(Taiwan) includes something on the order of 75,000 code points.  And
the Koreans use both Chinese characters _and_ their own set of 6500
"composed Hangul", look at the Unicode Standard.  On top of that,
there's a lot of private and reserved space, and suddenly 65535
(0xFFFF is illegal) characters aren't so many.

    KM> It may well be that the character numbering in such a
    KM> case lacks a bit of systematics, but this shouldn't be
    KM> much of a problem: table driven libraries etc. could be
    KM> made available to hide that fact from the programmer
    KM> (actually, it's only a problem with sorts. In Japanese,
    KM> due to ON/KUN reading a dctionary driven sort is required
    KM> anyway)

Mostly not used though.  Depends on whether you need dictionary order
or efficient searches.

    >> The real issue is not Unicode qua Unicode; while the Europeans
    >> in general would love to use Unicode to handle Oriental
    >> character sets, many Orientals are adamantly opposed.  Books
    >> have been written about

    KM> But what do they propose instead ?

The Tower of Babel.  They prefer the current state to one in which
their national languages are subordinated to the needs of machines.
Do you really want to give up your hard S?  Seems to me "ss" functions 
just as well.

    KM> This sounds VERY MUCH like a Tower-of-Bable story to me. It is
    KM> obvious that this is SERIOUSLY hampering global networking and
    KM> global software development.

Not really.  This stuff is actually pretty trivial; it was complicated 
by Stallman's decision to go with a multi-byte rather than a
wide-character representation.

    >> how Unicode will be the demise of the Japanese language, for
    >> example.  ....(Very interesting description of Emacs internals)

    KM> To put it bluntly: Emacs development is seriously
    KM> suffering from the fact that there is no global unified
    KM> encoding scheme in general use by now. lots of time is
    KM> actually wasted in order to customize the software to the
    KM> individual encoding schemes, as far as I understand.

So?  Maybe a lot of them prefer working on this problem to
implementing nicer frames or a Word-a-like (which I hear is Stallman's 
direction of the future for Emacs).  I don't have a problem with that.

    KM> To make things even worse, a glance onto the names of
    KM> implementers of free software shows that the vast
    KM> majority of them are of euro-american origin. It is safe

Are you reading the Japanese equivalent of comp.*?  If not I don't
think you have the right to make a call on that.

    KM> to assume that most of them don't have a background in
    KM> non european linguistics.  Thus their work will always
    KM> implicitly be western-centered.  However, if a universal
    KM> unified character encoding scheme would be in general
    KM> use, those people would use that and the problems would
    KM> vanish for the most part (specific entry methods would
    KM> still be necessary, but the rest would be the same
    KM> anywhere).

That happens to be a misconception.  Look up "kinsoku" in your
Japanese-English (or Japanese-Deutsch, if my guess is right about your
native language) dictionary.  The character set solves only a very
small subset of the problems of localized, internationalized, or
multilingual programming.

    KM> It is easy to see that this would solve a LOT of

Not to say that a LOT isn't a LOT.  Just that a LOT is a small
fraction of ALL, and some of the rest is HARD.

    KM> problems. To maintain that an attempt to create a unified
    KM> character encoding scheme like Unicode is an
    KM> "imperialistic plot" appears unfair against the
    KM> implementers and unconstructive as well to me. It doesn't

If the implementers aren't the users, it's imperialism.

    KM> BTW, the Linux 2.0.x kernels all have Unicode built

Which means what?

    KM> in. So, is the Linux community planning an imperialistic
    KM> plot against Asians (an interesting question as some of
    KM> those are Asians in fact....) ?????

Yup.  It's arguable.

However, systems programming is one thing; I don't expect even to be
allowed to use spaces in my file names; I think the Japanese and
Chinese and Koreans will be happy enough with Unicode for systems
programming.

But Mule is the ONLY software in which comparative classical
Chinese/Japanese scholarship can be written properly.  A Unicode
"Mule" would defeat this.

Internally, Unicode is the way to go for Mule and for systems
programming.  The former probably won't happen though because the
vested interests (Handa and RMS) are against it.
---------------------------------------------------------------
Next TLUG Nomikai: 14 January 1998 19:15  Tokyo station
Yaesu Chuo ticket gate.  Or go directly to Tengu TokyoEkiMae 19:30
Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691
Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station
Yaesu Chuo ticket gate.
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096

Follow-Ups:
- Re: tlug: Mule-begotten problems for Emacs and Gnus
  - From: Jon Babcock <jon@example.com>

References:
- tlug: Mule-begotten problems for Emacs and Gnus
  - From: "Stephen J. Turnbull" <turnbull@example.com>
- Re: tlug: Mule-begotten problems for Emacs and Gnus
  - From: Karl-Max Wagner <karlmax@example.com>

Prev by Date: tlug: Katakana to Romanji conversion
Next by Date: Re: tlug: Mule-begotten problems for Emacs and Gnus
Prev by thread: Re: tlug: Mule-begotten problems for Emacs and Gnus
Next by thread: Re: tlug: Mule-begotten problems for Emacs and Gnus
Index(es):
- Date
- Thread

Home | Main Index | Thread Index