Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Kanji to Hiragana soft



On Sat, 24 Oct 1998, Eric S. Standlee wrote:

> Is there a software package that will change kanji in to hiragana
> (furigana) for linux.  I need to take a large amount of Japanese text and
> filter kanji into hiragana for those who cannot read kanji well.

I've never seen anything like that on any platform, so you may have your
work cut out for you in this search.  If I may, I'd like to suggest that
what you really need here is a program that will supply hiragana readings in
addition to the kanji, rather than by replacing them.  A page of pure
hiragana is quite difficult to read.  It's such a nuisance, in fact, that I
probably wouldn't bother doing it if someone gave me such a page.

There are two approaches that could be taken to this.  One is to add
furigana to the kanji, which AFAIK requires a GUI-based solution.  The other
would be to insert hiragana readings in parentheses or square brackets as
inline text after the kanji in question.

A difficult point in doing this is that it will need a *big* dictionary, and
also a pretty accurate parser to decide where one word ends and the other
begins in cases where there are several kanji compounds in a row that are
not broken up by punctuation or interspersed kana.  Put another way, the
program you are talking about is more or less an inverted input method: it
takes kanji compounds, compares them against it's dictionary, and outputs
its best guess as to what the correct readings are.  It will need to be able
to not only check accurately find word boundaries and check its dictionary,
but have algorithms for deciding what to do about compounds that aren't in 
the dictionary (ignore them or try to figure it out and mark it as unsure).
A pretty necessary feature would also be the ability to add to the
dictionary.

The one part that's easier than making an IME is that it doesn't have to
deal with accepting output from applications.  This could be written so that
it just accepted a text file as input and produced another one (with kana
readings added) as output.  A really sophisticated one would work
interactively and allow the user to correct readings that were wrong, or
flag in red those that seemed questionable.

This is certainly not a trivial program, and one which there probably has
been and will continue to be little demand for on Linux (or on other
platforms too, maybe).  However, an accurate and reasonably fast tool that
could add furigana could potentially be a very useful item for language
teachers and students, etc.

I wish I had the ability to write something like this myself, I really do.
If one couldn't be located anywhere, I'd start working on it myself.

I have a CD with the Monash U. Nihongo archive on it.  I'll search through
it and see if I can find something.  I'll let you know what I come up with.

Cheers,

Jonathan

---------------------------------------------------------------
Next Nomikai: 20 November, 19:30 Tengu TokyoEkiMae 03-3275-3691
Next Meeting: 12 December, 12:30 Tokyo Station Yaesu central gate
---------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links