Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: Piping stderr?
- Date: 27 Jun 2002 19:09:21 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Re: Piping stderr?
- References: <3D109EC0.4070703@example.com><87n0trz38q.fsf@example.com><s3t3cvj6oad.fsf@example.com><87r8j1w8ol.fsf@example.com><877kktb463.wl@example.com><87u1nxuguj.fsf@example.com><87n0tlmn5r.wl@example.com><87bsa1nsvr.fsf@example.com><87vg895g7k.wl@example.com><871yaxnjei.fsf@example.com><87adpkj70k.wl@example.com><87u1nrao03.fsf@example.com><87znxjhlw4.wl@example.com><87ptyf91b3.fsf@example.com><87wusnhfg8.wl@example.com><876607wlpj.fsf@example.com><87eleuz973.wl@example.com><87d6ues4tn.fsf@example.com><87n0thf6m2.wl@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Informed Management (RC2))
>>>>> "Jiro" == Jiro SEKIBA <jir@example.com> writes: Jiro> Unicode is not THE codeset, but ONE OF CODESETS. I think you have been listening to Ohta-san's propaganda for too long. Unicode doesn't have "lots" of characters. CNS 11643 has "lots" of characters, more than Unicode 2.1 (and probably more than 3.2, but I haven't checked). Unicode _is_ THE Universal Character Set (UCS). Plus a whole bunch of essential algorithms for handling text. And if you really really have to have some character (or character set) that Unicode doesn't provide, there are a hundred thousand private space code points reserved for _you personally_. What's the problem? Jiro> I don't understand what you mean 'when', but it will just Jiro> automatically fallback into 'C' locale, and continue Jiro> working. If it can't fallback into C", C library is broken, Jiro> it means whole system already downed ;-). Exactly. But one of the reasons it doesn't fallback into C may very well be because the I18N library _thinks_ it's OK, but it's broken. This is not sufficient reason for my system to crash; dunno how you feel about that.... Jiro> If so, this IS the UTF-8 hard coded programs issue. Who said "hard code" UTF-8? In fact, I don't need that the programs I'm talking about to _ever_ interpret UTF-8. They interpret ASCII; anything containing non-ASCII is part of a string or a comment, and will be passed on verbatim or ignored. Validation, if necessary, should be done by other programs or the library functions called. All that needs to be hard-coded is recognition of a character: /* yes, I know there are much faster table-driven ways to do this */ if (*p & 0x80 == 0x00) /* ASCII */ length = 1; else if (*p & 0xE0 == 0xC0) /* multibyte */ length = 2; else if (*p & 0xF0 == 0xE0) length = 3; else if (*p & 0xF8 == 0xF0) length = 4; else if (*p & 0xFC == 0xF8) length = 5; else if (*p & 0xFE == 0xFC) length = 6; else /* illegal first byte, including 10xxxxxx */ abort(); This is not rocket science, and it is not going to change, ever. Jiro> If you have ten UTF-8 hard coded programs, you have to fix Jiro> each programs. On the other hand, on CSI design just fix Jiro> library. Programs don't need to be modified anything. Wrong. Dangerous, ugly stuff like Shift JIS will be wandering around _inside_ my program. To handle it correctly, I will need extra code. In _all_ my CSI programs. Jiro> Even if this is not what you mentioned, it shows the bad Jiro> thing of UTF-8 hard code programs. I'm not advocating doing _anything_ by hard-coding in each program. I'm advocating that simple applications that need to be robust should restrict themselves to a single small library intended to do just one well-defined thing well: process Unicode character streams, character by character. No bidi, no composed characters, no interpretation of surrogates (illegal in UTF-8 but I don't need to care). And no steenkin' Shift JIS, Big Five, or NEC kanji. Jiro> ##http://support.microsoft.com/default.aspx?scid=%2Fisapi%2Fgomscom%2Easp%3Ftarget%3D%2Fjapan%2Fsupport%2Fkb%2Farticles%2Fjp170%2F5%2F59%2Easp&LN=JA Jiro> ###BTW I do not much care about this Win issue ;-p, it's Jiro> just a example. Oh, _that_. Of course you can't round trip when the coded character set _intentionally_ provides multiple code points for the same character. Unless you go out of your way to cater to the brain-damage (cf full/half-width compatibility character in FF row of the BMP). This is _exactly_ the kind of junk you don't have to worry about if you restrict internal text to Unicode. >> Other than that, there are no efforts I know of. Again, be >> concrete. Jiro> I'm NOT talking about the problem of NOW. Who knows it's Jiro> never happen? Are you talking about SETI?[1] No sane earthling will design a character set to be incompatible with Unicode ever again. Jiro> What seems to be the problem? CSI means that arbitrarily stupid character encodings (Shift JIS is a leading example) can get inside my program. This means that _my_ program needs to deal with _their_ brain damage. I don't want my program to ever deal with Shift JIS. If my users want to see Shift JIS, I'll translate at the program boundary. I don't have a problem with that. Fewer lines of code, fewer libraries, means fewer things to go wrong. Footnotes: [1] Search for Extra-Terrestrial Intelligence. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN My nostalgia for Icon makes me forget about any of the bad things. I don't have much nostalgia for Perl, so its faults I remember. Scott Gilbert c.l.py
- Follow-Ups:
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- References:
- [tlug] Piping stderr?
- From: Josh Glover
- Re: [tlug] Piping stderr?
- From: Stephen J. Turnbull
- [tlug] Re: Piping stderr?
- From: Mike Fabian
- [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] remote
- Next by Date: Re: [tlug] remote
- Previous by thread: Re: [tlug] Re: Piping stderr?
- Next by thread: Re: [tlug] Re: Piping stderr?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links