Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
- Date: Fri, 09 Jun 2006 11:59:55 +1000 (EST)
- From: Jim Breen <Jim.Breen@example.com>
- Subject: Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
"Michael(tm) Smith" <smith@example.com> wrote: >> >> Does anybody have examples of non-UTF-8 web pages that mix >> Japanese characters with European accented characters? If so, what >> encoding do they use? I have a sample page with some JIS X 0212 kanji and accented characters at: http://www.csse.monash.edu.au/~jwb/wip.html >> My (very limited) understanding of Japanese >> encodings leads me to believe that the way they are likely to be >> encoded (if there are actually any of them in the wild) is in >> EUC-JP, and that they would need to assume JIS X 0212 support in >> whatever browser is use to view the pages. Not so. Practically no-one on the planet uses JIS X 0212 characters in EUC-encapsulated text in WWW pages. The reason is <drumroll>IE doesn't support the full EUC-JP</drumroll>. >> The set of characters that JIS X 0212 adds support for are shown >> here: >> >> http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK/jisx0212-1990.gif >> >> Most of them are additional kanji, but it includes characters for >> European languages also ("a ring", "e acute", etc.). Yup. I have an u-umlaut, etc. on that sample page. >> But I've heard that Internet Explorer does not support JIS X 0212, >> so it would seem unlikely that anybody would actually create EUC-JP >> pages that rely on JIS X 0212 support. Spot on. >> Yet, given that relatively >> few Japanese sites seem to use UTF-8, and that JIS X 0212 is not >> well supported, I'm left wondering how instances of these kinds of >> "mixed charset" pages are actually encoded in the real world. They either use UTF8 or encode the diacritics using HTML things like ô In my WWWJDIC server, all the data files are in EUC. If you set your dialogue (by cookie) to run in EUC or Shit_JIS (EUC is the default), the output routines substitute HTML entities for the diacritics, etc. and 16x16 images for kanji and kana (yes, JIS X 0212 has a few extra kana.) If you set it to UTF8, the raw codes go out. (This only applies really to the few dictionary entries with JIS212 kanji, the German file and the Buddhism file.) That's how I do it anyway. I started off putting out full EUC-encapsulated JIS212, but realised most of my users couldn't see it. Cheers Jim -- Jim Breen http://www.csse.monash.edu.au/~jwb/ Clayton School of Information Technology, Tel: +61 3 9905 9554 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大蛙触Â
- Follow-Ups:
- Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
- From: Michael(tm) Smith
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Fluxbox-- bringing up menus without the mouse
- Next by Date: Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
- Previous by thread: Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
- Next by thread: Re: [tlug] JIS X 0212? Any example "mixed charset" pages?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links