Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] oneliners, Was: Moving on from xterm



On 23 August 2016 at 08:47, NOKUBI Takatsugu <knok@example.com> wrote:
> kakasi has some problem.
> * the dictionary is too old
> * not good for complex sentence

Quite. It's not for reious work.

> MeCab is also useful for such situation.

Indeed. The pick of the bunch.

> mecab-ipadic-neologd is a good dictionary for MeCab.
> https://github.com/neologd/mecab-ipadic-neologd

Hmmm. IPADIC is long in the tooth too. Most serious users
of mecab would go for unidic as a morpheme dictionary.

I see that Toshinori Sato, who has compiled the "neologd"
extensions (there's one for unidic too) has added a lot of
expanded terms which are not really morphemes. For example
if I put ラテン文字で表記される into it, the unidic and ipadic
segmentation is ラテン+文字+で+表記+さ+れる, but if I try
it with neologd I get ラテン文字+で+表記+さ+れる. In other
words he's added ラテン文字 as a unitary noun. If that's what
you want, fine, and his work may well help apps which just
want to add furigana to text, but it's getting right away from
being a morphological analyzer.

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links