Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: Font Encodings - Re: tlug: Java and Japanese
- To: tlug@example.com
- Subject: Re: Font Encodings - Re: tlug: Java and Japanese
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Thu, 28 Aug 1997 18:11:03 +0900
- In-reply-to: Your message of "Thu, 28 Aug 1997 15:36:28 +0900." <Pine.HPP.3.95.970828153139.3832A-100000@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from "Stephen J. Turnbull" <turnbull@example.com> -------------------------------------------------------- I already sent a version of this to Craig; decided to clean it up and pass it on to TLUG, TWIW. On Thu, 28 Aug 1997, John Little wrote: gaijin>% gaijin>% I'm not sure what the "8859_1" means. Does anyone know? gaijin>% gaijin> gaijin> ISO encoding 8859_1, usually known as "Western English" or gaijin> "Latin-1", as opposed to 8859_2, the encoding for "European gaijin> English". The latter includes codes for umlaut, cedilla, acute gaijin> and friends. Check out the X11 fonts directory (encoding). This is not exactly true, in fact Latin-[1234] all have the accents and stuff for the major European languages; they are tweaked for ones with a small number of speakers. Latin-[5678] are completely revamped for Cyrillic, Arabic, Greek and Hebrew (which have no glyphs in common with ASCII, and so no space for the accented glyphs in these sets). Latin-9 and Latin-10 are needed only for one language each (Icelandic and Turkish, respectively) and can handle most majors. (Source: Nishikimi, et al. Maruchiringaru Kankyou no Jitsugen. Prentice-Hall.) >>>>> "Craig" == Craig Oda <craig@example.com> writes: Craig> That's weird. I wonder why I have to I have to specify Craig> 8859_1? I asked Tsurui-san about this and he said that he Craig> read it on the Java mailing list in reference to the JDBC. Craig> This is the same thing I was reading. There really wasn't Craig> an explanation of why it was needed. Tsurui-san thought it Craig> was the specification for unicode. Nah. Specifications of unicode and ISO-Latin-1 CAN'T matter (mostly) because they are unrelated to the semantics of this program as long as conversions are invertible. Ie, the only things that're relevant are that (1) the servlet NEVER produce non-Latin-1-equivalent Unicode characters; (2) Latin-1 to Unicode is one-to-one; (3) none of the bytes in the stream are non-Latin-1. (1a) HTTP/1.x specifies that unless otherwise stated by a Content-Type header, HTTP message bodies (including POSTs) MUST (caps in the RFC 2068 :) be presumed to be ISO-8859-1. Therefore if HttpServletRequest is correctly implemented, POSTs from broken clients will be interpreted by default as ISO-8859-1. (1b) A Java program automatically converts strings into Unicode; by (1a), the servlet package must tell Java that the input is Latin-1. (2) By specification. (3) By specification (Latin-1 uses all 256 code points; no byte is out of domain). The hole in (1) generates the bug, which is that when a properly internationalized client sends eg an ISO-2022-JP Content-Type or a UTF-? Content-Type, the servlet package should (if HttpServletRequest is properly implemented) produce Unicode Japanese out of those. (The default assumption is producing not Unicode but a 16-bit encoding of 8-bit bytes according to the Latin-1->Unicode tables. :-) This Unicode Japanese should then bomb (out of range) on back-conversion to ISO-8859-1 in Craig's code. Having no knowledge of servlets, I don't know how to handle this. Ciao Steve -- Stephen J. Turnbull Institute of Policy and Planning Sciences Yaseppochi-Gumi University of Tsukuba http://turnbull.sk.tsukuba.ac.jp/ Tel: +81 (298) 53-5091; Fax: 55-3849 turnbull@example.com Next TLUG meeting is Saturday October 11, 1997 ----------------------------------------------------------------- a word from the sponsor will appear below TWICS - Japan's First Public-Access Internet System. www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- References:
- Re: Font Encodings - Re: tlug: Java and Japanese
- From: Craig Oda <craig@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: jpeglib.h and Gimp
- Next by Date: tlug: Gimp novice question
- Prev by thread: Re: Font Encodings - Re: tlug: Java and Japanese
- Next by thread: tlug: jpeglib.h and Gimp
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links