Re: [tlug] [OT/long] Yet another JMdict front-end

Date: Tue, 01 Aug 2006 12:43:33 +1000 (EST)
From: Jim Breen <Jim.Breen@example.com>
Subject: Re: [tlug] [OT/long] Yet another JMdict front-end

Matt Gushee <matt@example.com> wrote:
>> Now on to more substantive issues:
>> 
>> Indexing approach
>> -----------------
>> 
>> There will probably be several indexes in the future, but currently I 
>> provide one way to look up Kanji: a traditional radical/stroke-count 
>> index. Specifically, you select the radical stroke count, then the 
>> radical itself, then the stroke count for the whole character, then the 
>> specific character that you want. Although it is a linear process and 
>> thus easy to understand in principle, it has the disadvantage that 
>> people don't know by heart how many strokes are in a character, and it 
>> can be very hard to figure out for the more complex ones. In a printed 
>> dictionary it's less of a problem because you can easily shift your eyes 
>> to another part of the page; in a browser I think it will be awkward at 
>> best.
>> 
>> What other alternatives might work well (when you don't know the 
>> pronunciation)? I've seen Jim Breen's "multi-radical" method and was 
>> initially resistant to it for a couple of reasons: first, it is 
>> non-linear, and thus is superficially more complex than the 
>> radicals/strokes method.

But MUCH more popular with the great unwashed. Some time ago I 
extracted measurements from WWWJDIC on kanji lookups. The multi-
radical method won. See: 
http://www.csse.monash.edu.au/~jwb/kanjindx.html   for a paper
about kanji indexing.

>> Second, I have been taught (for both Chinese and Japanese) that the 
>> radical is the "meaning" component, and that in general a character has 
>> exactly one radical. At any rate, I believe the radical has etymological 
>> significance, and that understanding which part of the Kanji is the 
>> radical can contribute to an overall mastery of the language. And a 
>> single-radical dictionary index reinforces that understanding.

Only partly true. For "semasio-phonetic" kanji it may provide
at least the semantic domain, but the linkage can be vague at
times.

>> But I'm thinking that a multi--can I say "component" instead of 
>> "radical"?  Then maybe I could set aside the philosophical objection. 
>> Anyway, a well-designed multi-thing index might after all be an easier 
>> way to look up Kanji.

It sure is. For WWWJDIC I hope one day to do a Java-based
version rather than the current vanilla HTML form approach.

>> Strokes/radicals index navigation
>> ---------------------------------
>> 
>> If I decide to go to a multi-component index, this might not matter any 
>> more. But for the moment, there is an issue with the index menus: in 
>> view of the fact that the user will often not be sure how many strokes 
>> there are in a character, I have created dynamic menus such that ... 
>> actually it's best if you try it out. Basically, if you move your mouse 
>> over an item in one row of the menu, the next row is *temporarily* 
>> displayed. Thus, let's say you have chosen a given radical. There is a 
>> row of numbers representing stroke counts of characters with that 
>> radical; if you run your mouse along that row you can easily see what 
>> characters exist for each stroke count.
>> 
>> So, do you think this is (a) useful, and (b) intuitive? It would be a 
>> lot easier to make the menus so that the next row only changes when you 
>> click something. But if people find the transient display a very helpful 
>> feature, I will make it work.

Seems quite good so far. 

>> Presentation of results
>> -----------------------
>> 
>> Currently when you select a Kanji, a request goes to the server, which 
>> returns a document containing all phrases that start with that Kanji. 
>> This document is dumped into a table with 3 columns: [Kanji] Phrase, 
>> Reading, and Definitions. This is reasonable in some cases, but 
>> sometimes the response document is quite large, so I think some kind of 
>> chunking and/or filtering would be helpful. It gets worse if we want to 
>> look up all phrases *containing* the selected character. My server-side 
>> script can indeed do that, but sometimes it's just way too much data, so 
>> I've disabled that behavior for the moment.

Comments.

- you leave out the part-of-speech, etc. Not a good idea.

- you use a comma between glosses - better to use ";" as commas
  occur withing glosses and it can get ambiguous.

>> Another issue with the result sets is that they're not sorted in any 
>> useful way--actually I believe they are ordered according to the JMdict 
>> entry sequence number.

Yes, which is a mixture of headword order (on the day it was
first built, and then historical. Not a good display order.

>> So, how can I improve the processing and presentation of the results?

JMdict has various frequency of use tags, which may be useful for
ordering.

I find the spaced-out table a bit clunky.

>> Miscellaneous technical stuff
>> -----------------------------
>> 
>> Preparing the index: my list of radicals is derived from Jim Breen's 
>> KANJIDIC, but since his data is prepared for a multi-radical lookup 
>> system, I can't automatically extract a radicals-and-strokes index, so I 
>> am currently creating the index manually. 

Tsk, tsk. WWWJDIC has a page of classical radicals. See:
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwraddisp.cgi The file
that built that table is used by xjdic too and is inthe xjdic
tarball.

>> That's why it's so incomplete, 
>> of course. Does anyone know of another database somewhere that list each 
>> kanji by (single) radical and stroke count?

Why do you need another?   8-)}

Seriously, there are a few others around, but they are (almost)
all derived from KANJIDIC.

>> Glyphs for radicals: if my understanding of the KANJIDIC documentation 
>> is correct, there is a glyph of each radical in Japanese Kanji, but some 
>> of them only exist in JISX-0212. 

Not even in that case. JIS212 added some, but the rest really came
later.

>>If so, you either have to require the 
>> user to have a JISX-0212 font, use images to represent some radicals, or 
>> use substitute glyphs from JISX-0208. The last option is not really 
>> acceptable, I don't think. E.g., 化 for 篋阪��??

As you prolly know, Unicode replicated all te classical radicals
in a blockof their own.

HTH

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大蛙触�

Follow-Ups:
- Re: [tlug] [OT/long] Yet another JMdict front-end
  - From: Stephen J. Turnbull

Prev by Date: Re: [tlug] Hosting
Next by Date: [tlug] Is having no "iptables" bad?
Previous by thread: [tlug] The art of googling.
Next by thread: Re: [tlug] [OT/long] Yet another JMdict front-end
Index(es):
- Date
- Thread

Home | Main Index | Thread Index