Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lingo] List of new words in Japanese



Stuart Luppescu writes:
>
> This would be a very interesting programming exercise. You would take a
> corpus of Japanese text (you could probably crawl the web), parse the
> text and compare each word to an online dictionary. Words that don't
> match are either neologisms, errors or obscure words. Hopefully, it
> would be mostly the first.

It's a wee bit more complicated that that, as Japanese parsers use
dictionaries and a fair bit of AI to find "word" boundaries. With
unknown words the results are at best ambiguous, and at worst
useless

Stephen J. Turnbull added:
> Actually, I find Japanese neologisms to be mostly pretty boring.

Actually find them rather interesting, and have started a whole
research project on methods for tracking them down. Parsing large
corpora as Stuart suggests is indeed one method. There are others
such as mimicking morphological processes and checking if the
resulting "words"  are actually in use, etc. etc.

> There's a set of rules for forming them, and there's a rather high
> rate of formation.  The number of 現代用語辞典, especially the number
> updated annually, is indicative.

Most of the entries in those 現代用語辞典 are not that new; I think a
very small proportion change each year. Also many are compound nouns/
multi-word expressions such as 季節舌皛逅跂勉闕情動障害. Most of the words in
the 最新カタカナ section of my most recent 現代用語辞典
(自由国民社) are far from new.

> I'd be more interested in usage of obscure words.

Yes, I'm interested in them too, although it can be fun sorting out
the obscure but persistent words from the more ephemeral ones.

Cheers

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, VCA Secondary School, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links