Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: Japanese in URLs?
- Date: Thu, 07 Feb 2008 07:13:03 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Re: Japanese in URLs?
- References: <5634e9210802051929x4bc51a54n6c075baaf2c3ddeb@mail.gmail.com> <78d7dd350802052123g7761aab5s3057f2615100d359@mail.gmail.com>
Nguyen Vu Hung writes: > 2008/2/6, Jim Breen <jimbreen@example.com>: > > and I don't want the browser to play it back to me as an > > expletive in Klingon because it decided it was somethig in UTF-8. > > It's different, of course, if the field has an ACE prefix such as > > "xn--". > RFC2718[1] says the URL *should* be encoded after the character sequences > is transtalted to UTF-8. No, it doesn't. First off, RFCs are supposed to be about wire protocols. How browsers present data received from users or the wire is basically off-limits to RFCs; that's really more a field for W3C recommendations. Second, RFC 2718 is a informational companion to RFC 2717 (how to register new URL schemes), and is not standards-track. The appropriate references here would be to internationalized URLs, cf. RFCs 3454, 3490-3492, 3743, 4290, 4690, and especially RFC 3987. As far as I know, browsers which display anything but the hex-encoded path are strictly speaking in violation of RFC 3987: 6.2. Software Interfaces and Protocols Although an IRI is defined as a sequence of characters, software interfaces for URIs typically function on sequences of octets or other kinds of code units. Thus, software interfaces and protocols MUST define which character encoding is used. because there is no provision in any URL scheme I know of for defining the character encoding, with the exception of IDNA's "xn--" ACE prefix which implies PUNYCODE UTF (RFC 3492). (Note that RFC 3987 does *not* define an ACE for the path portion of an IRI. That means that there is no in-band way of recognizing the ACE representation of an IRI.)[1] Even there, RFC 3490 says: 6.1 Entry and display in applications [...] ACE encoding is opaque and ugly, and should thus only be exposed to users who absolutely need it. Because name labels encoded as ACE name labels can be rendered either as the encoded ASCII characters or the proper decoded characters, the application MAY have an option for the user to select the preferred method of display; if it does, rendering the ACE SHOULD NOT be the default. > What Firefox doing is not wrong but personally, I think the browser > should be able to display actual Japanese for better readability. Only at the user's explicit request. You know, there are three kinds of people (very loosely speaking) who still put non-ASCII into mail headers (ie, without encoding as MIME-words): spammers, Russians, and Japanese. I think it's really sad that the real humans are classed with those haploid spammers! That's because Japanese (and Russian) programmers arrogantly decided that they didn't need I18N and just detect everything according to whichever of their encodings a string fits. Browsers that detect encodings in URLs are making the same mistake and are in violation of the section of RFC 3987 quoted above. Footnotes: [1] ACE means "ASCII-compatible encoding" and is defined in RFC 3490, but probably elsewhere as well since it lists many other ACE prefixes that have been defined.
- Follow-Ups:
- Re: [tlug] Re: Japanese in URLs?
- From: Jim Breen
- References:
- [tlug] Re: Japanese in URLs?
- From: Jim Breen
- Re: [tlug] Re: Japanese in URLs?
- From: Nguyen Vu Hung
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Re: Japanese in URLs?
- Next by Date: Re: [tlug] Re: Japanese in URLs?
- Previous by thread: Re: [tlug] Re: Japanese in URLs?
- Next by thread: Re: [tlug] Re: Japanese in URLs?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links