Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT/long] Yet another JMdict front-end
- Date: Tue, 01 Aug 2006 12:43:33 +1000 (EST)
- From: Jim Breen <Jim.Breen@example.com>
- Subject: Re: [tlug] [OT/long] Yet another JMdict front-end
Matt Gushee <matt@example.com> wrote: >> Now on to more substantive issues: >> >> Indexing approach >> ----------------- >> >> There will probably be several indexes in the future, but currently I >> provide one way to look up Kanji: a traditional radical/stroke-count >> index. Specifically, you select the radical stroke count, then the >> radical itself, then the stroke count for the whole character, then the >> specific character that you want. Although it is a linear process and >> thus easy to understand in principle, it has the disadvantage that >> people don't know by heart how many strokes are in a character, and it >> can be very hard to figure out for the more complex ones. In a printed >> dictionary it's less of a problem because you can easily shift your eyes >> to another part of the page; in a browser I think it will be awkward at >> best. >> >> What other alternatives might work well (when you don't know the >> pronunciation)? I've seen Jim Breen's "multi-radical" method and was >> initially resistant to it for a couple of reasons: first, it is >> non-linear, and thus is superficially more complex than the >> radicals/strokes method. But MUCH more popular with the great unwashed. Some time ago I extracted measurements from WWWJDIC on kanji lookups. The multi- radical method won. See: http://www.csse.monash.edu.au/~jwb/kanjindx.html for a paper about kanji indexing. >> Second, I have been taught (for both Chinese and Japanese) that the >> radical is the "meaning" component, and that in general a character has >> exactly one radical. At any rate, I believe the radical has etymological >> significance, and that understanding which part of the Kanji is the >> radical can contribute to an overall mastery of the language. And a >> single-radical dictionary index reinforces that understanding. Only partly true. For "semasio-phonetic" kanji it may provide at least the semantic domain, but the linkage can be vague at times. >> But I'm thinking that a multi--can I say "component" instead of >> "radical"? Then maybe I could set aside the philosophical objection. >> Anyway, a well-designed multi-thing index might after all be an easier >> way to look up Kanji. It sure is. For WWWJDIC I hope one day to do a Java-based version rather than the current vanilla HTML form approach. >> Strokes/radicals index navigation >> --------------------------------- >> >> If I decide to go to a multi-component index, this might not matter any >> more. But for the moment, there is an issue with the index menus: in >> view of the fact that the user will often not be sure how many strokes >> there are in a character, I have created dynamic menus such that ... >> actually it's best if you try it out. Basically, if you move your mouse >> over an item in one row of the menu, the next row is *temporarily* >> displayed. Thus, let's say you have chosen a given radical. There is a >> row of numbers representing stroke counts of characters with that >> radical; if you run your mouse along that row you can easily see what >> characters exist for each stroke count. >> >> So, do you think this is (a) useful, and (b) intuitive? It would be a >> lot easier to make the menus so that the next row only changes when you >> click something. But if people find the transient display a very helpful >> feature, I will make it work. Seems quite good so far. >> Presentation of results >> ----------------------- >> >> Currently when you select a Kanji, a request goes to the server, which >> returns a document containing all phrases that start with that Kanji. >> This document is dumped into a table with 3 columns: [Kanji] Phrase, >> Reading, and Definitions. This is reasonable in some cases, but >> sometimes the response document is quite large, so I think some kind of >> chunking and/or filtering would be helpful. It gets worse if we want to >> look up all phrases *containing* the selected character. My server-side >> script can indeed do that, but sometimes it's just way too much data, so >> I've disabled that behavior for the moment. Comments. - you leave out the part-of-speech, etc. Not a good idea. - you use a comma between glosses - better to use ";" as commas occur withing glosses and it can get ambiguous. >> Another issue with the result sets is that they're not sorted in any >> useful way--actually I believe they are ordered according to the JMdict >> entry sequence number. Yes, which is a mixture of headword order (on the day it was first built, and then historical. Not a good display order. >> So, how can I improve the processing and presentation of the results? JMdict has various frequency of use tags, which may be useful for ordering. I find the spaced-out table a bit clunky. >> Miscellaneous technical stuff >> ----------------------------- >> >> Preparing the index: my list of radicals is derived from Jim Breen's >> KANJIDIC, but since his data is prepared for a multi-radical lookup >> system, I can't automatically extract a radicals-and-strokes index, so I >> am currently creating the index manually. Tsk, tsk. WWWJDIC has a page of classical radicals. See: http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwraddisp.cgi The file that built that table is used by xjdic too and is inthe xjdic tarball. >> That's why it's so incomplete, >> of course. Does anyone know of another database somewhere that list each >> kanji by (single) radical and stroke count? Why do you need another? 8-)} Seriously, there are a few others around, but they are (almost) all derived from KANJIDIC. >> Glyphs for radicals: if my understanding of the KANJIDIC documentation >> is correct, there is a glyph of each radical in Japanese Kanji, but some >> of them only exist in JISX-0212. Not even in that case. JIS212 added some, but the rest really came later. >>If so, you either have to require the >> user to have a JISX-0212 font, use images to represent some radicals, or >> use substitute glyphs from JISX-0208. The last option is not really >> acceptable, I don't think. E.g., 化 for 篋阪?? As you prolly know, Unicode replicated all te classical radicals in a blockof their own. HTH Jim -- Jim Breen http://www.csse.monash.edu.au/~jwb/ Clayton School of Information Technology, Tel: +61 3 9905 9554 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大蛙触
- Follow-Ups:
- Re: [tlug] [OT/long] Yet another JMdict front-end
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Hosting
- Next by Date: [tlug] Is having no "iptables" bad?
- Previous by thread: [tlug] The art of googling.
- Next by thread: Re: [tlug] [OT/long] Yet another JMdict front-end
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links