Re: tlug: unicode

To: tlug@example.com
Subject: Re: tlug: unicode
From: jwb@example.com (Jim Breen)
Date: Tue, 27 May 1997 17:25:26 -0500
In-Reply-To: "Stephen J. Turnbull" <turnbull@example.com> "Re: tlug: unicode" (May 27, 3:22pm)
Reply-To: tlug@example.com
Sender: owner-tlug

--------------------------------------------------------
tlug note from jwb@example.com (Jim Breen)
--------------------------------------------------------
On May 27,  3:22pm, "Stephen J. Turnbull" wrote:
} Subject: Re: tlug: unicode
>> 
>>  For example, suppose I'm grepping for all the
>> Japanese words in a Chinese-language nihongo textbook.  Note that
>> properly done, that text book probably has separate Chinese and
>> Japanese fonts for the same character depending on which language it
>> occurs in.  Given the stylistic difference mentioned above, the human
>> eye can pick these things out immediately.  Given a 31-bit code space,
>> a UCS-4 grep can too.

I hope it never has to - It would be a disaster of the first order if
Chinese and Japanese ended up as distinct sets.

The issue of the appearance of different characters is one of mark-up, not
of character set. When I want something in italics, I wrap it in {\it...},
being a LaTeX person. I don't expect a different character set. I expect
that eventually national font styles will be handled in a similar fashion.

>> There are other ways to do this, of course.  For example, you can put
>> in language tags (escape sequences).  So now Unicode, for this
>> purpose, looks like ISO-2022.  Excuse me for not being thrilled :-)

This is really a presentation markup. It doesn't thrill me, but I prefer
it to the alternative.

>> This kind of multilingual issue is not a huge deal for most people, of
>> course.  But then go a little farther: for most people, Shift-JIS does
>> just fine.  I think we've just reached Jim Breen's limit of tolerance.
>> No JIS X 0212.  :-)  

Wait for it! There's an extension planned for JIS X 0208 to soak up the
"spare space" I just got Masayuki Toyoshima's latest announcement, which
you can read in an earlier version on http://www.tiu.ac.jp/JCS/

>>  (According to what I
>> heard at the M17N conference, the Chinese National Standard is likely
>> to end up with 80,000 characters in it!  Some of them created
>> specially for the purpose, apparently ;-)

Just like JSA did for the 1983 version of JIS X 0208   8-)}

>>   Can't they spare 2 or 3 code
>> points for the Spanish?

My copy of Vol 1 of Unicode is at home, but I think they added in the
"ll", etc. BTW, the Spanish speaking world no longer collates "ll" 
differently, to the eternal relief of programmers.

>> The point is that display routines have to be complicated with all
>> sorts of special processing anyway.  Think about hyphenation,
>> proportional spacing, sizes, faces, colors, etc.  This example is
>> trivial with proportional fonts, and assuming a monospace font, you
>> only need to use a peephole filter to catch the few characters that
>> require two glyphs for printing), anyway.  _Text in RAM_ should be
>> uniform, to facilitate grepping and stream processing in general.

Not to mention problems like joining up of characters in Arabic, or mixing
L->R text with R-L. (You heard the joke about the advert. for a business
machine in a Tel Aviv newspaper which mentioned the "ENIL NO" button.)

>> This would then have an XFontList-like user interface, so that
>> Japanese users of Unicode would use generic "Unicode -> JIS 208" and
>> "Unicode -> JIS 212" CMaps, backed up by a secondary "Unicode -> Big
>> 5" CMap, finally backed up by a default "Unicode -> Unicode" CMap,
>> corresponding to fonts like Utsukushii-JISX0208, Mama-JISX0212,
>> Mou-ii-BIG5, and KimochiWarui-UCS2.  Since these tables are generic,
>> they don't need to be recreated for every new font.  This provides
>> backward compatibility with old fonts and programs at the cost of
>> supplying CMaps (since it would be built into the font engine).

I think I'll give up....

Cheers

Jim

-- 
Jim Breen          [$@%8%`!&%V%j!<%s(J@$@%b%J%7%eBg3X(J]
Department of Digital Systems.                  Monash University, 
Clayton VIC 3168 Australia (p) +61 3 9905 3298 (f) +61 3 9905 3574  
j.breen@example.com   [http://www.dgs.monash.edu.au/~jwb/]
-----------------------------------------------------------------
a word from the sponsor will appear below
-----------------------------------------------------------------
The TLUG mailing list is proudly sponsored by TWICS - Japan's First
Public-Access Internet System.  Now offering 20,000 yen/year flat
rate Internet access with no time charges.  Full line of corporate
Internet and intranet products are available.   info@example.com
Tel: 03-3351-5977   Fax: 03-3353-6096

Follow-Ups:
- Re: tlug: unicode
  - From: "Stephen J. Turnbull" <turnbull@example.com>

Prev by Date: Re: tlug: newbie...well, potential newbie
Next by Date: Re: tlug: Font problem
Prev by thread: Re: tlug: unicode
Next by thread: Re: tlug: unicode
Index(es):
- Date
- Thread

Home | Main Index | Thread Index