Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] I hate encodings!
- Date: Sun, 10 Sep 2006 20:57:21 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] I hate encodings!
- References: <mailman.86.1156798909.9509.tlug@example.com> <009c01c6cb18$56336420$0f01a8c0@example.com> <20060829063935.944f37b2.attila@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b27 (linux)
>>>>> "Attila" == Attila Kinali <attila@example.com> writes: Attila> On Tue, 29 Aug 2006 12:08:01 +0900 Attila> "Jeff Madsen" <jeff@example.com> wrote: >> Hope that question made sense - you can probably detect my >> confusion already! Attila> As far as i know there is no such documentation. Of course there is. Some of the worst introductory stuff for encodings etc was written by yours truly ;-), there's an existence proof for you. I know there's better by now, but I don't know where off hand. _Linux Journal_, "Alphabet Soup", ca. Mar-Apr 1999 IIRC. _Professional Linux Programming_, Ch. 28, "Internationalization", Wrox Press, 2000. _Linux Nihongo Kankyo_, O'Reilly Japan, 1999 or so (with Craig Oda, Hiroo Yamagata, and Rob Bickel. There's some stuff at debian.org. Ken Lunde's "Understanding Japanese Information Processing" (O'Reilly, often referenced as "UJIP") is excellent but low-level (doesn't discuss the web at all), now superseded by his "Chinese, Japanese, Korean, and Vietnamese Information Processing" (also O'Reilly, often referenced as "CJKV"), which I haven't actually read. I think they're both out of print now in English. For web stuff, you want to find out about content negotiation in HTTP. You will need to read the MIME RFCs (2045--2049), some of which are only really relevant to mail, but I forget which you can omit offhand. Apache's documentation on its mechanisms is good but assumes you know a lot in advance. The Unicode Consortium site has some good but really technical stuff; TR#17 is worth skimming to get an idea of the issues. Dealing with Japanese is a pain in the butt because (1) the Japanese have 5 major encodings in common use (JIS/ISO-2022-JP, EUC-JP, Shift-JIS, Unicode UTF-8, and romaji---eg, domain names), and each has many minor variations, and (2) Japanese mostly don't care about anything else yet so a lot of Japanese sites (even today) assume that the language is Japanese or US English so the various encodings are easy to tell apart automatically---which means they often don't implement charset negotiation. Attila's general advice is excellent, so I won't repeat or comment here. I will add that if you're going to put everything into UTF-8 as he suggests (but you may run into opposition from Japanese colleagues), you should have a language tag. Someday you will need to mix Chinese or Korean with Japanese content, and then you'll be glad you did. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Wifi interactions between English and Japanese OS'es/Networks.
- Next by Date: Re: [tlug] Interactions between English and Japanese OS'es/Networks.
- Previous by thread: Re: [tlug] Interactions between English and Japanese OS'es/Networks.
- Next by thread: [tlug] Making fun of Linux distributions
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links