Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Big5 Vs. Unicode Vs. Netscape 4.x Vs. deadline



Jonathan,

I used to do similar applications (web interfaces
to LDAP directories) back in the Netscape 4
days. My experience was that converting on the fly
between UTF-8 and Big5 had minimal performance impact.
I used Perl (mod_perl), with the conversion
done in C. 

I would expect that Java would be able to
do this reasonably efficiently. But you never know...
I often find the Java development environment/process
to be less effective than what we were doing with 
Perl/Apache years ago :-). 

Putting Big5 in the database is OK, too. It is best
if the db supports Big5 code, though, otherwise there
may be wierd query results. Big5 can be a pain
to work with, as it often includes "special" characters
in with the data as the 2nd byte which need quoting --
< or & cause problems with HTML, and ' causes problems with
SQL. 

Jake 

--- Jonathan Q <jq@example.com> wrote:
> Let me present you with a hypothetical situation.
> 
> I disavow any association with it except for having
> recently hypothetically stepped into a sort of hypothetical
> rescue-kibitzer role.
> 
> A company developed a database-backed intranet for a certain
> other, large company's office in a rather prosperous part of
> China.  The initial development was in English and now a Chinese
> translation is being done.  The programmer working on this
> created the Chinese-language entries in the database in 
> Unicode.
> 
> Today, she learned some interesting facts:
> 
> 1) Netscape 4 doesn't support Unicode;
> 
> 2) 90%+ of the customer's staff are using Netscape 4.
>    Telling them to upgrade is out of the question.
> 
> The site is using JBoss and Apache for Windows, along with
> some Other Company's database.
> 
> Her options at this point would seem to be:
> 
> 1) Write or find a servlet that will convert the Unicode
>    in the database to Big5 on the fly;
> 
> 2) Throw all caution to the wind and convert the entire
>    database to Unicode and be done with it.
> 
> Oh, and did I mention that the project due date is Friday, so
> she's expected to have it in the customer's hands on Thursday so
> they can start checking it before the weekend?
> 
> No milestone versions or betas have been done at all.  Like I
> said, I disavow all association with that hypothetical project.
> 
> I also hypothetically advised her that she really needs to
> have a good input filter to make sure that whatever the
> customer's staff input to the database, it is converted to 
> Unicode or whatever else the database ends up finally using,
> since otherwise your database will doubtless quickly fill with
> all sorts of crap.
> 
> She seems a bit too young to know about ugly old browsers and
> a bit thin on knowledge of the pitfalls of mutli-byte platforms
> issues.
> 
> So, my question to you good people (and BOFHs :-) is, "What would
> you advise her to do?  I'm sort of leaning toward solution 2, plus
> the input filter (of course), since the customer has thousands
> of employees and all of that outbound conversion could lead to
> significantly elevated server loads that they haven't planned
> on or budgeted for.  On the other hand, keeping the database in
> Unicode is probably a cleaner solution.
> 
> 
> TIA,
> Jonathan
> 
> 
> **********************************************************
> TLUG server is hosted by Open Source Development Lab Japan
> http://www.osdl.jp/
> **********************************************************
> 
> ==========================================================
> To unsubscribe from this mailing list, 
> please see instructions at <http://www.tlug.jp/list.html>
> ==========================================================
> 


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links