Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Search MySQL for Japanese Names]



>> My next interest would be spread of names in the real population. Who
>> knows how the results of the above would be weighted then...
>
> Hard data to get too. When I was at Tokyo Gaidai they had access to
> a full copy of the NTT directory. It would have been nice to do some
> frequency measures on names, and geographical dispersions on
> family names, but there was an embargo on any publications
> drawing on the data. They said it was because of "privacy".

Well, in the case of rare names some privacy can be broken through
practical assumptions. I once read that the US census doesn't open the
stats on baby names that occur less than, say, a 100 times a year.

> In a year or so i'll be working on a major expansion of the lexicon(s)
> used by MeCab et al.. I'll probably be starting with NAIST-JDIC.  I'm
> less interested in correct POS tagging and more in correctly
> identifying compounds. I want 米軍 to be recognized; not come up
> as 米 + 軍.

Cool. But I'll have to get a server with more memory and as much cache
as possible to fit these expanding dictionaries in memory..... damn,
intractable Japanese-parsing requirements.

Akira


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links