Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Japanese regex question
- Date: Mon, 29 Aug 2005 16:41:07 +0900
- From: "Ben K. Bullock" <benkasminbullock@example.com>
- Subject: Re: [tlug] Japanese regex question
- References: <200508241701.55144.jq@example.com><20050825183913.O88704@example.com><200508251253.47083.jq@example.com><20050826113217.J88704@example.com><87zmr2me23.fsf@example.com><30ce843605082808003eac8faa@example.com><87y86mkrrg.fsf@example.com><20050828173528.796c3073@example.com> <87u0h9l14p.fsf@example.com>
----- Original Message ----- From: "Stephen J. Turnbull" <stephen@example.com> To: <tlug@example.com> Sent: Monday, August 29, 2005 3:15 PM Subject: Re: [tlug] Japanese regex question >>>>>> "Botond" == Botond Botyanszki <tlug@example.com> writes: > > Botond> I had the impression while coding in perl that it was > Botond> handling text in unicode. And it seems to be the case > Botond> according to the FAQ at > Botond> http://rf.net/~james/perli18n.html#Q4 > > Could very well be. I haven't done anything in Perl since > Hanshin-Awajishima Daishinsai (ie, about Feb 1 1995), so I don't > know. However, perusing that FAQ suggests to me that the default is > unspecified unibyte ASCII superset, not UTF-8. If you want to treat > the strings as Unicode you need to use special functions. It looked > like you need to enable locale support rather than having it done > automatically. Etc, etc. I've been reading this discussion and thinking whether or not to reply. I wrote a reply to another message of yours yesterday but decided not to send it, but now I've changed my mind, and I'll send another response as well. Since this might just be useful for someone, let's point out how hard it is to use utf-8 in Perl. To get Perl to use UTF-8, try use utf8; Then each Unicode character is exactly equivalent to an ascii character for every purpose. That's all you need to make, for example "." in a regular expression match all Unicode characters, or to use UTF8 variable names in your code, or to make length ("馬鹿") == 2; rather than 4 or 6, etc. etc. In future versions of Perl, "use uft8;" is going to become a non-functioning command and utf8 will be switched on by default. The only thing this does not do is turn on input and output to files in utf-8. To get Perl to understand that a file is in UTF-8 format, one has to state binmode FILE, ":utf-8"; Note that "binmode" is the Perl command which can turn on or off the "text" mode for output. The "text" mode is necessary for things like ensuring the right newline/carriage return stuff for text input and output depending on whether we're in Unix or Dos or etc. One uses binmode FILE, ":raw"; to read in raw bytes without this conversion. So it's actually a very sensible compromise to have a utf-8 handle, I think; it doesn't break legacy code. > In other words, it looks to me like by default Perl 5.6 supported I18N > oblivious programming, with minimal I18N being easy, but not default. Perl is a 20 year old programming language and it supports backward compatibility with old versions of itself, including a whole bunch of things which are now more or less superceded. I completely disagree with you; I think the Perl designers have got this issue right and that the Unicode support in Perl is excellent. ___________________________________________________________ How much free photo storage do you get? Store your holiday snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com
- Follow-Ups:
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
- References:
- [tlug] Japanese regex question
- From: Jonathan Byrne
- Re: [tlug] Japanese regex question
- From: Tod McQuillin
- Re: [tlug] Japanese regex question
- From: Jonathan Byrne
- Re: [tlug] Japanese regex question
- From: Tod McQuillin
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
- Re: [tlug] Japanese regex question
- From: Ian Wells
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
- Re: [tlug] Japanese regex question
- From: Botond Botyanszki
- Re: [tlug] Japanese regex question
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] GUI font tools
- Next by Date: Re: [tlug] Japanese regex question
- Previous by thread: Re: [tlug] Japanese regex question
- Next by thread: Re: [tlug] Japanese regex question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links