Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Search MySQL for Japanese Names]
- Date: Tue, 27 Oct 2009 18:38:08 +1100
- From: Jim Breen <jimbreen@example.com>
- Subject: Re: [tlug] Search MySQL for Japanese Names]
- References: <5634e9210910191749m675cdf8cl3ca73efa0fcbeccb@example.com> <36e8d89d0910191858j2ba89691lb10648d0465fc109@example.com>
Sorry to be answering late. I was in Belgium at a conference and not in a position to poke around in my files. 2009/10/20 黒鉄章 <akira@example.com>: > Absolutely right. Mecab/Chasen dictionaries (IPADIC, Unidic, whichever > one you plug into them) don't include anywhere the amount of name > readings as ENAMDICT. By design these parsers don't want multiple > readings for names. They just want the most likely one. Well, even then the coverage is poor. > Jim, curious question: how many names in ENAMDICT resolve to just one > reading? Even a I-would-have-thought-surefire candidate for uniqueness > such as 田中(tanaka) resolves to ten different readings in ENAMDICT > (tanata, tanka, danaka, nunoka, ....). 鈴木(suzuki) has seven. Well, turning it round the other way, ~74k kanji-names have 2 or more readings. I maintain the file in a single-reading format, so the seven 鈴木s are in seven different entries. The version used in WWWJDIC has them merged together with an attempt to get the more common readings first. Some stats: - raw data file: 728k entries - version with merged entries: 597k entries. So those ~74k merged entries come from ~205 "raw" entries, i.e. approx. 2.8 readings per entry for the 74k. Jim -- Jim Breen Adjunct Snr Research Fellow, Clayton School of IT, Monash University Treasurer: Hawthorn Rowing Club, VCA Secondary School, Japanese Studies Centre Graduate student: Language Technology Group, University of Melbourne
- Follow-Ups:
- References:
- Re: [tlug] Search MySQL for Japanese Names]
- From: Jim Breen
- Re: [tlug] Search MySQL for Japanese Names]
- From: 黒鉄章
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Webmaster training
- Next by Date: Re: [tlug] linux@example.com How many widely can we do that?
- Previous by thread: Re: [tlug] Search MySQL for Japanese Names]
- Next by thread: Re: [tlug] Search MySQL for Japanese Names]
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links