Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: unicode
- To: tlug@example.com
- Subject: Re: tlug: unicode
- From: jwb@example.com (Jim Breen)
- Date: Wed, 28 May 1997 09:46:51 -0500
- In-Reply-To: "Stephen J. Turnbull" <turnbull@example.com> "Re: tlug: unicode" (May 27, 5:52pm)
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from jwb@example.com (Jim Breen) -------------------------------------------------------- On May 27, 5:52pm, "Stephen J. Turnbull" wrote: } Subject: Re: tlug: unicode >> >> Jim> I hope it never has to - It would be a disaster of the first >> >> Too late; this is what Mule does already. I think it's unlikely to >> change, since it's an efficient way to handle multilingual input and >> editing. I regard Mule as a pre-Unicode system, and its ISO-2022 style internal coding an interim technique. Frankly I don't think it will persist, if Unicode gets general acceptance. >> But I gave a practical, if relatively contrived and >> trivial, example of when the language tag has real semantic meaning. True, but a bit of an isolated case. If I was searching a document containing a mix of French & English, I'd have little chance. In a few cases I could detect words as distinct within the language, but where they were common (and perhaps faux amis such as `manifestation') I'd be stuck. The mixed Chinese/Japanese example is pretty unique one, as in the world of computerized text processing, it is probably the only case where you could find languages using essentially the same characters in different encodings. >> Also (despite the unification philosophy) the identical character can >> have different meaning in the different languages. That would mean >> that a content-indexing program would want to carry language along >> with characters. You can argue that it's not important, that you can >> handle it otherwise. I'd like to give the programmers the flexibility >> to implement it with wider characters in a standard way. Well, I wouldn't. I like to separate things. It's only a short step to say that `y' should have different codings for English and French, because it can be a consonant for one, but only a vowell (of sorts) for the other. Remember we are not doing this for the programmers. I think we'll have to agree to disgree on that. >> >From the user perspective, what will happen, I think, is what you >> would want: Mule will convert to Unicode before writing a file. The >> 4-byte representation will rarely be seen outside of RAM owned by Mule >> and similar tools. I just think it's good to standardize an internal >> code for things like Mule; we have a good framework for doing it. But without round-trip capability. Going from a Unicode text into Mule_internal_format will be fun, unless you hack in language markers to tell Mule to map into quasi-GB internally. Jim -- Jim Breen [$@%8%`!&%V%j!<%s(J@$@%b%J%7%eBg3X(J] Department of Digital Systems. Monash University, Clayton VIC 3168 Australia (p) +61 3 9905 3298 (f) +61 3 9905 3574 j.breen@example.com [http://www.dgs.monash.edu.au/~jwb/] ----------------------------------------------------------------- a word from the sponsor will appear below ----------------------------------------------------------------- The TLUG mailing list is proudly sponsored by TWICS - Japan's First Public-Access Internet System. Now offering 20,000 yen/year flat rate Internet access with no time charges. Full line of corporate Internet and intranet products are available. info@example.com Tel: 03-3351-5977 Fax: 03-3353-6096
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: Canna and looking for other comments on a new install
- Next by Date: tlug: Over the router.....
- Prev by thread: Re: tlug: unicode
- Next by thread: tlug: unicode tcs
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links