Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: tlug: Two Qs re translation project
- To: tlug@example.com
- Subject: Re: tlug: Two Qs re translation project
- From: Adrian Havill <havill@example.com>
- Date: Fri, 28 Jan 2000 17:33:56 +0900
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=iso-2022-jp
- Organization: TurboLinux Japan
- References: <000c01bf695f$dce889e0$10210685@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
Frank Bennett wrote: > Looks like it's time for Frank to go back to school. > > Duh, can UTF-8 be interpreted correctly by browsers in common > circulation, and if so (or if it's on a rising wave) what is the best > reference text on it? All the 4.x series of popular browsers can do it (even the 3.x version of Windows Nav could do it with a hack). Early revs of the Japanese Netscape versions-- especially the Windows versions-- for some reason the Unix ones get it right because they use a pseudo-font for Unicode had their Unicode font by default set to English Arial, so it required newbie JP users to set the font manually to a Japanese font. I use UTF-8 on my personal web pages for the Japanese if you want to test your browser support. It's definitely on the rising wave and the future. Most new protocols such as XML, etc., make UTF-8 support _mandatory_ (EUC-JP in XML is an "option"). So if you migrate to XML or XHTML (now a W3C Recommendation) in the future, you can count on every app supporting UTF-8, even English apps. The best docs are The Unicode Standard... but you can find a lot of free documentation on the web about it, because it's the encoding of choice for BeOS and many other things these days. It you go to UTF-8, you get the plus benefit that you'll be able to also correctly search Latin-1 text, which most commercial web pages use (even if it's all English, English pages often use the "degree" marks and the accented vowels (resume, Pokemon, sake). Not to mention an easy upgrade path to allow it to do Chinese and Korean indexing as well. > Also ... if we move to a new encoding, we'll need a conversion tool. > Is there a Unix filter that can munge one of the common Jse > encodings into UTF-8? glibc 2.1's "iconv" can do it, so can Plan 9's "tcs" (a Unix port is available) and Java's "native2ascii" (in a roundabout manner, though). But if you allow me to toot my own horn, "ucconv", a sample app with the "fugu" library, works great on web data... better than iconv, and has the following features (for real Japanese WWW text) that iconv doesn't have-- intelligent error-recovery scheme for Japanese that's broken (edited with broken or ASCII-only editors-- very common in the real world) and gracefull fallback into HTML/SGML encodings. (You can both translate character references like © and dec/hex NCRs $#x4E00; into straight UTF-8 and convert back into ASCII editor safe NCRs). And handles "Windows JISx208 extensions (the NEC and IBM extensions) and correctly handles the NEC/IBM extension ambiguity in MS-Windows extended JIS set. (as well as the extended Mac Japanese set). Also compiles on Windows (NT with non-free MS tools or 95/98/NT with Cygwin) and BeOS, so if you have Content-guys that are not fortunate to be running a OSS-based OS, they can still use ucconv native on their systems. It's technically "alpha" software, but the "alpha" means I haven't put all the features in the API that I want in it yet-- the CJK and UTF/Unicode converters have been VERY well tested with lotsa real-world Chinese/Japanese (and some Korean) data and are complete, as is the ucconv sample app. GPL License, so no warranty/guarantee though. I'll through in e-mail support, though. :) <URL:ftp://ftp.turbolinux.co.jp/pub/fugu/> If you're working with well formed EUC-JP and don't need the extra HTML/SGML translation/conversion or other filter/features and the content generation/handling system is Linux, you should use iconv that comes with glibc 2.1. No need for a new tool when the one that's on your OS does the job. -------------------------------------------------------------------- Next Nomikai Meeting: February 18 (Fri) 19:00 Tengu TokyoEkiMae Next Technical Meeting: March 11 (Sat) 13:00 Temple University Japan * Topic: TBD -------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- References:
- RE: tlug: Two Qs re translation project
- From: "Frank Bennett" <bennett@example.com>
Home | Main Index | Thread Index
- Prev by Date: RE: tlug: Two Qs re translation project
- Next by Date: Re: tlug: Linux DVD
- Prev by thread: RE: tlug: Two Qs re translation project
- Next by thread: RE: tlug: Two Qs re translation project
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links