Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: Piping stderr?
- Date: Thu, 27 Jun 2002 17:50:13 +0900
- From: Jiro SEKIBA <jir@example.com>
- Subject: Re: [tlug] Re: Piping stderr?
- References: <3D109EC0.4070703@example.com><87n0trz38q.fsf@example.com><s3t3cvj6oad.fsf@example.com><87r8j1w8ol.fsf@example.com><877kktb463.wl@example.com><87u1nxuguj.fsf@example.com><87n0tlmn5r.wl@example.com><87bsa1nsvr.fsf@example.com><87vg895g7k.wl@example.com><871yaxnjei.fsf@example.com><87adpkj70k.wl@example.com><87u1nrao03.fsf@example.com><87znxjhlw4.wl@example.com><87ptyf91b3.fsf@example.com><87wusnhfg8.wl@example.com><876607wlpj.fsf@example.com><87eleuz973.wl@example.com><87d6ues4tn.fsf@example.com>
- User-agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3(Unebigoryōmae) APEL/10.3 Emacs/20.7(i386-debian-linux-gnu) MULE/4.1 (AOI)
At 26 Jun 2002 19:38:12 +0900, Stephen J. Turnbull <stephen@example.com> wrote: >> UTF-8 supports lots of character used in world wide, but not >> perfect at all. > > Be concrete. I don't know of any major missing character sets or > characters (that aren't scheduled or proposed for addition). > Admittedly there are political problems (such as the influential > Nikkei minorities in Canada, Mexico, and Finland whose national > character sets look remarkably like IBM kanji; and the "Ukrainian > problem" where the Russians on the USSR standards committee didn't see > fit to submit Cyrillic characters only used in Ukrainain). I'm not saying that it doesn't include major missing character sets or character, but I'm saying it's not perfect, that you agreed. > In any case, either way effort has to be made to support those > characters internally. Why not devote that effort to getting them > into Unicode, then subclassing Unicode to handle any special > properties they have? Why have to devote to getting them into Unicode? Unicode is not THE codeset, but ONE OF CODESETS. It's just happened to have lots of characters, that's all. >> Less burden I think. > > For the programmer, when it works. Consoles, shells, and scripts > should not depend on such complexity, because when (_not if_) it > breaks, it can take the whole system down. Which complexity? About scripts, I agree coz script has own environment, it could be good idea to be free from system locale. I don't understand what you mean 'when', but it will just automatically fallback into 'C' locale, and continue working. If it can't fallback into C", C library is broken, it means whole system already downed ;-). > Also, in case you haven't noticed, the Internet and information > systems generally have become a decidedly more hostile environment. > Did you know that UTF-8 was respecified in Unicode 3.1 _for security > reasons_? How does CSI I18N handle the security issues involved in > delegating text handling to user-provided routines, etc? My bet is > "not at all". ??? What are you talking about the security of Unicode 3.1?? You meant this? SECURITY The Unicode and UCS standards require that producers of UTF-8 shall use the shortest form possible, e.g., produc ing a two-byte sequence with first byte 0xc0 is non-con forming. Unicode 3.1 has added the requirement that con forming programs must not accept non-shortest forms in their input. This is for security reasons: if user input is checked for possible security violations, a program might check only for the ASCII version of "/../" or ";" or NUL and overlook that there are many non-ASCII ways to represent these things in a non-shortest UTF-8 encoding. If so, this IS the UTF-8 hard coded programs issue. If you have ten UTF-8 hard coded programs, you have to fix each programs. On the other hand, on CSI design just fix library. Programs don't need to be modified anything. #Even if this is not what you mentioned, it shows the bad thing of #UTF-8 hard code programs. If what you meant is not that, please give me a pointer ;-). >> But filter is not always perfect. SJIS can't round trip >> UTF-8 (e.g 0x5C) as you know. It's like, you get home and >> take the shoes off, later you try to get out with the same >> shoes, but left shoe is stolen ;-). > > Since when? Since Unicode includes all characters in JIS, that means > Shift JIS can't round trip JIS, either. Wouldn't surprise me, but as > far as I know that's not true. You just have to use the right mapping. ah- SJIS handled on glibc can round trip UCS-4, sorry. #In other words, glibc only handles that range. ##This is Windows case, but it is ;-) ##http://support.microsoft.com/default.aspx?scid=%2Fisapi%2Fgomscom%2Easp%3Ftarget%3D%2Fjapan%2Fsupport%2Fkb%2Farticles%2Fjp170%2F5%2F59%2Easp&LN=JA ###BTW I do not much care about this Win issue ;-p, it's just a example. >> And more, in future it is very possble that codeset which >> can't map into UTF-8. > > Mojikyo? That's not a character set, that's a glyph set. Not to > mention that it's nonstandard and nastily proprietary (the UTF-2000 > people were forced to remove mojikyo support from their version of > XEmacs). And there is plenty of room for a thousand Mojikyos in > UCS-4. It won't be Unicode-conformant, but upward compatible. No, I'm just talking about the possibility. > Other than that, there are no efforts I know of. Again, be concrete. I'm NOT talking about the problem of NOW. Who knows it's never happen? Then, it's better strip encoding dependent code from programs than hard code it. CSI I18N designed programs support UTF-8 codeset, you can use it as UTF-8 programs, if you want. And it's easier to use API than interpreting UTF-8, having Unicode character property database inside the program. What seems to be the problem? -- Jiro SEKIBA | Web tools & AP Linux Competency Center, YSL, IBM Japan | email: jir@example.com, jir@example.com
- Follow-Ups:
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- References:
- [tlug] Piping stderr?
- From: Josh Glover
- Re: [tlug] Piping stderr?
- From: Stephen J. Turnbull
- [tlug] Re: Piping stderr?
- From: Mike Fabian
- [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
- Re: [tlug] Re: Piping stderr?
- From: Jiro SEKIBA
- Re: [tlug] Re: Piping stderr?
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] remote
- Next by Date: Re: [tlug] HELP cannot connect to the sound daemon
- Previous by thread: Re: [tlug] Re: Piping stderr?
- Next by thread: Re: [tlug] Re: Piping stderr?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links