Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] Optimizing Search for kanji strings
- Date: Thu, 19 Jan 2006 23:12:18 -0500
- From: Jim <jep200404@example.com>
- Subject: [tlug] Optimizing Search for kanji strings
- References: <43D043AC.2030908@example.com>
David Riggs wrote: [> Jim wrote:] > >David, of the hundreds of megabytes of text, how big is each file? > >What is the longest line in any of those files? > >What is the largest file? > To answer the data questions: each line is 20 to 80 characters That is nice and very reasonable in any character coding. > The 326MB is in 2460 files in 56 folders, Learn how to master the find command to deal with the 56 folders, although if you can get away with filename globbing (as you seem to), the just stick with globbing. > 2.5MB max file size, Great! That means that each file can be sucked into memory for easier searching. > My current perl script does what I had originally hoped for. Good. You have the "First make it work _right_" part done. > It is really pretty fast, for what it does, Good! How fast is that? How much faster to you want it to run?
- References:
- [tlug] re: Searching for kanji strings
- From: David Riggs
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Kanji file names-- how to change encoding from euc-jpto utf-8
- Next by Date: Re: [tlug] Docbook XML for documenting database tables
- Previous by thread: [tlug] re: Searching for kanji strings
- Next by thread: [tlug] Kanji file names-- how to change encoding from euc-jp to utf-8
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links