Mailing List Archive
tlug.jp Mailing List
tlug archive
tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Date: Wed, 19 Jul 2006 00:25:04 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- References: <44BCAFF3.6030604@example.com>
- Organization: The XEmacs Project
- User-agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.5-b27 (linux)
>>>>> "Dave" == Dave M G <Dave> writes: Dave> There may be existing software that does what I'm looking Dave> for, but I haven't seen it. If you know of a suitable Linux Dave> based application, please let me know. I doubt such a thing exists. It seems like a common thing to do, but actually there's infinite variation; even if you did find an application designed for the purpose, you'd still probably need to script it. Also, doing it well involves sufficiently much grunt work that I doubt you'll find an open source program (eg, last I heard rikaichan was not open at all, which is one reason why I don't use it). Dave> What I'd like to do is take a Japanese document and convert Dave> it into a list of the kanji included, and a list of Dave> words. Ideally repetitions would be removed, This is easy, at least to the 90% accuracy level. You just assume that each contiguous batch of kanji starts a word. Getting past that would require looking up each possibility for a connective (eg, the め in 閉め切た触息� 碯� 蜩逾� 竢釿辮搐瘡踟 抅癆 葹鰾� 鶴洹� 癈 猪棭� 鞜鶯蜒跂� 瘤� 阡蒹� 苒瘢轣拄竅� 蜴肚繝拄闔鶤 繖蜒廊繻 ?� 桃痺� 痲筬闔 矚黼� 闔 田秒圦 粹纉 � 葹趾燾� 褊� 閹 抅蜩 ḿ鞳竕肅竅跛磲 籙� 閹扖� 鈬繖 捃 扖跛 蜚 犛纈� 猪鰾� 矼芍� 瘤� 緕筱 瘤� 蜚ヸ 逡竏 矼揵纈 癆 蜴洹鶯蜴� 迴鴃蓖齷銓痺拄� 竏瘤艱� 抅瘤 蜚 蜩 癆 鱚迴海鈑 鞜鶯蜒跂鶇� 磨諱皷 犾跛 瘡齒 粹 抅蜩� 髟蜚� � 砠� 矼揵纈� 彦洹 鈬洹� 椵繖 諱諱皷 蜴 逋 阯� 瘰韭蜒癆蜿銖� 抅阨艾� 彦洹 纔扖鈔繖 繖蜒廊繻 捃 皷逅踟 黼癇竏 肬� 諱鉅� 瘤� 癈齦辣 抅癆ヸ � 猪鰾 矼芍銕蜴膃 抅緕 齡癇� 瘰韭簞鈑 迴鴃蓖跫芍竅� 揥瘤黽闥轣拄闔� 瘤� 頏閼桲� � 跚齡 閹 竅鈔蜆癆纉 癆 縺竏 竟皷拄闔� 被牖洹鬪 蜚 燾� 闔踟 � 頏闖� 閹 竢釿辮幞 髟蜚� 鼬阯 瘤� 頏閼桲蜴� 聲� 捃� 轣銷 竅鈔蜆癆纉� 听抅纈 抅瘤 鱚肅鈬 蜚� � 粤竕粤� 抅癆 闔跚鈬 椵瘍� 犛纈� 抅� 椵纈 齔繝蜀蜈� 抅� 猪鰾 燾� 矼揵纈 抅瘤 矚戾� 頏閭纉皷鈑� 囈蛹� 蜀 籙� 竅� 葹站 也齔 抅蜩 猪棭� 矼 � 苡閼 韭痺� 捃 齡癇廊 鶴洹� 皮鱇艨釶 瘤� 諱懲諱釶 猪鰾� 竢棭� 矼 糅關鞳� 捃鎬 壽蜩 蜩 縺齷� 鶴洹� 吏 棭拄轣扖 苡瘡 猪棭� 矼 捃 竰縺扖 � 跚齡 抅癆 葹� 鶴洹� 粤肅鉗拄闔� 瘤� 鱚痲蜴苴� 壽蜩 蜩 縺齷� 籙� 褫齡 黼癇竏 抅鳫梥� 抅� 田秒� 籬懲矚黼� ⑬ 噬蓖闌 閹 囮齡纃� 瘤� 侮肬鴉癆蜿� 梼芍鈬纈蜴� 蔗捥痕婭𣗄鈞棭谺鼡厲齦謨矚劓祟褓 寰蝟纈皷摑 閹 夾棨桛� 壹銕閼瘟 鵜鵜� 夾棨桛� 外記元軍 柄仭� 蒼� 鈿� 蓖� 籙� 竅� ≫錢 胙繞 齒胄燾鱚 碯皷鈬齠� 癈� 犛癆 籙𣗄 碯皷鈬齠 竅� ≫� 肬鬆 胙繞 齒胄燾鱚�
- Follow-Ups:
- Re: [tlug] [OT] Strip Kanji from a document for study purposes
- From: Josh Glover
- References:
- [tlug] [OT] Strip Kanji from a document for study purposes
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by Date: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Previous by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links