Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Mail archiving question
- Date: Sat, 4 Aug 2007 11:04:31 +1000
- From: "Jim Breen" <jimbreen@example.com>
- Subject: Re: [tlug] Mail archiving question
- References: <5634e9210708030501q228e9e31ya2f3dfdb29168cf6@mail.gmail.com> <87odhobsfy.fsf@uwakimon.sk.tsukuba.ac.jp>
On 04/08/07, Stephen J. Turnbull <stephen@example.com> wrote: > Jim Breen writes: > > I foolishly volunteered to help set up a searchable > > email archive for the Honyaku mailing list (A few > > TLUGers are also on that list.) My current task is to > > extract the essential headers (From, Subject, Date, ...) > > and the body of the email, convert them to UTF-8 and > > store them as one file per email. I am working on a collection > > of about 40,000 accumulated emails from the last 18 months. > > I would use Python's email module. Proof of concept would be about 20 > lines of code, I guess. (Hint: the email module treats mail as a > quasi-dictionary of headers, with Unicode key-value pairs. All that's > left is using the right codec in the flatten method after deleting the > headers you don't want.) Given my minimal level of Python skill, those 20 lines of POC may take weeks. > Simon Cozens might recommend Mail::Audit (and then again he might not; > while he hasn't found the One True Language yet, at least he's been > abandoning false ones at a great rate). If I recall the author > correctly, you can trust it to be more bullet-proof than the Swiss > internet backbone. Proof of concept would be only one line > noise. (White noise, that is, at 120dB.) I looked the package Simon mentioned, but took fright. > MHonArc may have an appropriate option. Thanks. I'll look into that. We actually have the archive system already, with regex searching, etc. It's the one-off importing of a batch of emails that's needed. > metamail? Now that's a blast from the past. Since it does almost all I want, I'm inclined to use it for this once-off. I now have the "base64" utility, so I can detect and deflect stuff containing html, etc. Thanks Jim -- Jim Breen Honorary Senior Research Fellow Clayton School of Information Technology, Monash University, VIC 3800, Australia http://www.csse.monash.edu.au/~jwb/
- Follow-Ups:
- Re: [tlug] Mail archiving question
- From: Josh Glover
- References:
- [tlug] Mail archiving question
- From: Jim Breen
- [tlug] Mail archiving question
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Emergency nomikai August 17th?
- Next by Date: RE: [tlug] [OT] Good IT Resume
- Previous by thread: Re: [tlug] Mail archiving question
- Next by thread: Re: [tlug] Mail archiving question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links