Re: Japanese input (was RE: tlug: Japanese)

To: tlug@example.com
Subject: Re: Japanese input (was RE: tlug: Japanese)
From: "Stephen J. Turnbull" <turnbull@example.com>
Date: Wed, 10 Jun 1998 13:09:02 +0900 (JST)
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <XFMail.980609182835.asbel@example.com>
References: <13693.15797.695722.166749@example.com><XFMail.980609182835.asbel@example.com>
Reply-To: tlug@example.com
Sender: owner-tlug@example.com

>>>>> "Matt" == Matthew J Francis <asbel@example.com> writes:

    Matt> On 09-Jun-98 Stephen J. Turnbull wrote:

    Matt> [UCS-4]

    >> You're missing my point then.  Those wide open spaces mean it
    >> needs to be infinitely flexible.  (That's an exaggeration, of
    >> course.)
    >> 
    >> But for the near future that I see we will need to handle the
    >> entire Babel of character sets, including some that don't exist
    >> yet.  More flexibility....

    Matt> Hmm, have you looked at Gaspar's Yudit code? I don't see

No.  One of these days....

    Matt> that what you're talking about is anything more than it can
    Matt> do already; to add support for a new codepage, you merely
    Matt> tell it how to map the new characters to a font, and (if you
    Matt> wish to) how to convert input to those characters. All
    Matt> external configuration.

Brother, you are in for some unpleasant surprises :-)  Check out
locale (5), o-negai-shimasu.  No silver bullet here.

    Gaspar> o Languages to be added dynamically to the input
    Gaspar> conversion server.

    >> Why?  Why not run multiple backends, as today where many
    >> systems run Wnn and Canna concurrently?  Or are you talking
    >> about the input manager (like kinput2)?

    Matt> Running Wnn *and* Canna sounds unnecessarily memory-hungry,
    Matt> and symptomatic of design nastiness somewhere. Would it not

Nope.  Symptomatic of the fundamental fact that "tastes differ."  In
any case, that was an example to demonstrate feasibility.  I think it
would be insane to try to overload a Japanese server with algorithms
for Devanagari or Arabic.  Multiple servers.

    Matt> be better to have one server with enough flexibility therein
    Matt> to support both methods, and if really necessary, both
    Matt> protocols?

Such a server would be just as big as the two put together.
Memory-hungry?  For dictionaries, yes.  Incompatible, though, they
can't be combined.  No silver bullet here.

    >> Go ahead.  If it's good, I'll jump in.  But I know better than
    >> to try to design one from scratch, myself.  I can serve much
    >> better elsewhere.  There are plenty of attempts out there to
    >> try to improve on.

    Matt> Hmm. Let's step back a moment; the words "design one from
    Matt> scratch" are not ones I care to use often. Failure to re-use
    Matt> code where appropriate is a (virtual) sin, and I'm not

As Gaspar pointed out, most of the code sucks EGG.  Don't reuse the
code (most of it).

Please to reuse protocols.

    Matt> denying that there is a lot of relevant code out there. We
    Matt> seem to be talking about slightly different things, so I
    Matt> will try to set out a little more clearly what I (think I'm)
    Matt> getting at:

    Matt>  - Many new programs are continually being developed,
    Matt> especially these days for projects like GNOME and KDE.

    Matt>  - Most of these programs do not support MBCS, and
    Matt> international input and display properly.

    Matt>  - It could be made easy, or at least much easier, for them

I dispute this.  But see the disclaimer below.

    Matt> to do so.  Exactly how is open to debate, but is also crying
    Matt> out for standardisation of some sort.

POSIX locales + XIM are a start.  I have read the XIM documents, but
do not claim to understand them.  I have been unable to locate
documents on POSIX locales.  I have read the ISO-2022 standard (not
entriely relevant here), the Unicode v2.0 book, and the ISO-10646
book.  I just got the X/Open Guide to transition to UCS.

There is standards aplenty.  I'm the only one talking about them,
except when people denigrate Unicode as "Microsoft-biased," thus
implicitly discounting them.  Doesn't that give you pause?

Notwithstanding, I am not an expert, I don't know crap about this.
Many people in this discussion admit to having read none of the
relevant standards (I don't know where you stand).  Some people are
ready to jump in and start coding when they don't know the difference
between wide characters and multibyte characters, or the difference
between LANG, LC_ALL, and LC_TYPE.

    Matt>  - Therefore, even if the current libraries and protocols do
    Matt> do everything necessary, there's something missing
    Matt> somewhere, even if only glue and public awareness.

    Matt> What to do about it?

    Matt>  - Working/improving the necessary support into the widget
    Matt> library text-entry and text-display level would help
    Matt> enormously. There's no reason this couldn't be done while
    Matt> retaining support for current input methods.

Uh-huh.  I'll believe it when I see it; there's nobody here offering
support for such a heavy effort.  Programmers will work on what's fun.

    Matt>  - Making things Unicode internally would additionally allow
    Matt> reliable multiple-language-in-one-place support.

No.  ISO-2022 does as much as Unicode can, just not very efficiently.
Neither does very much.  No silver bullet here.

    Matt>  - Make sure the fundamental services are provided
    Matt> separately, and other things will also have one way to do

People will disagree about the granularity (cf the above discussion of 
Wnn and Canna as separate conversion engines).  Fine granularity, more 
protocols.  Coarse granularity, less flexibility.

    Matt> every globalised thing they need to. If current libraries,
    Matt> backends, protocols and so forth really are up to the job
    Matt> already, then we just use them. If there's a genuine need
    Matt> for improvement, we can adapt and alter as necessary.

    Matt>  - Abstracting Unicode text support would, as Gaspar said,
    Matt> be a good place to start.

Yes.

    Matt>  - Then some widgets, adding other support as and if
    Matt> necessary. If the GTK/GNOME/KDE/etc. people find them worthy
    Matt> of integrating, they can.

No.  Some is not enough.  Partial support will _not_ take over the
world.  Viz. the Japanese on Netscape on Linux madness.  w3.el under
Mule has full multilingual support, input and output.  Why doesn't
anybody use it?  Because it doesn't support tables, plugins and the
like well, if at all.

Specializing to the input issue, people will use a complete widget
set; using an add-on widget set where some capabilities are supported,
and others not, will drive both developers and users away.  No silver
bullet here.

    Matt> If you think that's mad, I'll gladly suffer the label. When
    Matt> I can send mail and write text in any program of my choice,
    Matt> entering text in Japanese and English, quoting someone else
    Matt> in Greek, and with menus and messages in Tengwar, without
    Matt> jumping through hoops, I will rest happy...

May we both live so long!

--------------------------------------------------------------
Next TLUG Meeting: 13 June Sat, Tokyo Station Yaesu gate 12:30
Featuring Stone and Turnbull on .rpm and .deb packages
Next Nomikai: 17 July, 19:30 Tengu TokyoEkiMae 03-3275-3691
After June 13, the next meeting is 8 August at Tokyo Station
--------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp

Follow-Ups:
- Re: Japanese input (was RE: tlug: Japanese)
  - From: Gaspar Sinai <gsinai@example.com>
- Re: Japanese input (was RE: tlug: Japanese)
  - From: "Matthew J. Francis" <asbel@example.com>

References:
- Re: Japanese input (was RE: tlug: Japanese)
  - From: "Stephen J. Turnbull" <turnbull@example.com>
- Re: Japanese input (was RE: tlug: Japanese)
  - From: "Matthew J. Francis" <asbel@example.com>

Prev by Date: RE: tlug: What's the best
Next by Date: RE: Japanese input (was RE: tlug: Japanese)
Prev by thread: Re: Japanese input (was RE: tlug: Japanese)
Next by thread: Re: Japanese input (was RE: tlug: Japanese)
Index(es):
- Date
- Thread

Home | Main Index | Thread Index