Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] searching for kanji strings, ignore punctuation and endof lines



David Riggs wrote:
> If I could take a two line unit spat out by grep -A2, then process it
> as a separate set, I could do it rather easily. Strip out stuff after 
> the match for the first kanji: newline, punctuation, and line numbers. 
> Then if there is a match print out the working data area.


How about making a second copy of the text with the punctuations stripped 
(preserving the line count) and then search the phrase from there?
It's a bit of a kludge, but if disk space isn't a problem, then this is an easy 
way. Since you have to do this a lot, the processed copy might even give you 
that needed speed boost. (I'm assuming your haystack won't change a lot, will it 
always be the CBETA canon?)

-moogs


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links