Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: unicode
- To: tlug@example.com
- Subject: Re: tlug: unicode
- From: jwb@example.com (Jim Breen)
- Date: Tue, 27 May 1997 17:25:26 -0500
- In-Reply-To: "Stephen J. Turnbull" <turnbull@example.com> "Re: tlug: unicode" (May 27, 3:22pm)
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from jwb@example.com (Jim Breen) -------------------------------------------------------- On May 27, 3:22pm, "Stephen J. Turnbull" wrote: } Subject: Re: tlug: unicode >> >> For example, suppose I'm grepping for all the >> Japanese words in a Chinese-language nihongo textbook. Note that >> properly done, that text book probably has separate Chinese and >> Japanese fonts for the same character depending on which language it >> occurs in. Given the stylistic difference mentioned above, the human >> eye can pick these things out immediately. Given a 31-bit code space, >> a UCS-4 grep can too. I hope it never has to - It would be a disaster of the first order if Chinese and Japanese ended up as distinct sets. The issue of the appearance of different characters is one of mark-up, not of character set. When I want something in italics, I wrap it in {\it...}, being a LaTeX person. I don't expect a different character set. I expect that eventually national font styles will be handled in a similar fashion. >> There are other ways to do this, of course. For example, you can put >> in language tags (escape sequences). So now Unicode, for this >> purpose, looks like ISO-2022. Excuse me for not being thrilled :-) This is really a presentation markup. It doesn't thrill me, but I prefer it to the alternative. >> This kind of multilingual issue is not a huge deal for most people, of >> course. But then go a little farther: for most people, Shift-JIS does >> just fine. I think we've just reached Jim Breen's limit of tolerance. >> No JIS X 0212. :-) Wait for it! There's an extension planned for JIS X 0208 to soak up the "spare space" I just got Masayuki Toyoshima's latest announcement, which you can read in an earlier version on http://www.tiu.ac.jp/JCS/ >> (According to what I >> heard at the M17N conference, the Chinese National Standard is likely >> to end up with 80,000 characters in it! Some of them created >> specially for the purpose, apparently ;-) Just like JSA did for the 1983 version of JIS X 0208 8-)} >> Can't they spare 2 or 3 code >> points for the Spanish? My copy of Vol 1 of Unicode is at home, but I think they added in the "ll", etc. BTW, the Spanish speaking world no longer collates "ll" differently, to the eternal relief of programmers. >> The point is that display routines have to be complicated with all >> sorts of special processing anyway. Think about hyphenation, >> proportional spacing, sizes, faces, colors, etc. This example is >> trivial with proportional fonts, and assuming a monospace font, you >> only need to use a peephole filter to catch the few characters that >> require two glyphs for printing), anyway. _Text in RAM_ should be >> uniform, to facilitate grepping and stream processing in general. Not to mention problems like joining up of characters in Arabic, or mixing L->R text with R-L. (You heard the joke about the advert. for a business machine in a Tel Aviv newspaper which mentioned the "ENIL NO" button.) >> This would then have an XFontList-like user interface, so that >> Japanese users of Unicode would use generic "Unicode -> JIS 208" and >> "Unicode -> JIS 212" CMaps, backed up by a secondary "Unicode -> Big >> 5" CMap, finally backed up by a default "Unicode -> Unicode" CMap, >> corresponding to fonts like Utsukushii-JISX0208, Mama-JISX0212, >> Mou-ii-BIG5, and KimochiWarui-UCS2. Since these tables are generic, >> they don't need to be recreated for every new font. This provides >> backward compatibility with old fonts and programs at the cost of >> supplying CMaps (since it would be built into the font engine). I think I'll give up.... Cheers Jim -- Jim Breen [$@%8%`!&%V%j!<%s(J@$@%b%J%7%eBg3X(J] Department of Digital Systems. Monash University, Clayton VIC 3168 Australia (p) +61 3 9905 3298 (f) +61 3 9905 3574 j.breen@example.com [http://www.dgs.monash.edu.au/~jwb/] ----------------------------------------------------------------- a word from the sponsor will appear below ----------------------------------------------------------------- The TLUG mailing list is proudly sponsored by TWICS - Japan's First Public-Access Internet System. Now offering 20,000 yen/year flat rate Internet access with no time charges. Full line of corporate Internet and intranet products are available. info@example.com Tel: 03-3351-5977 Fax: 03-3353-6096
- Follow-Ups:
- Re: tlug: unicode
- From: "Stephen J. Turnbull" <turnbull@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: newbie...well, potential newbie
- Next by Date: Re: tlug: Font problem
- Prev by thread: Re: tlug: unicode
- Next by thread: Re: tlug: unicode
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links