Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Blocking unknown and unclear bots
- Date: Thu, 25 Feb 2010 17:00:33 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Blocking unknown and unclear bots
- References: <4B834398.8030609@example.com> <4B834BCA.80401@example.com> <20100223043730.GC30350@example.com> <4B836B34.1000403@example.com> <20100223065744.GF30350@example.com> <4B85D68D.5030600@example.com>
Dave M G writes: > > So why were people saying these bots were "bad"? > Short answer: > > Crawling for emails or information to use for spam... maybe? No, apparently they're just bad because they're on some list. (See below.) > Here, just by way of example, is a list of bad bots: > > http://www.invision-graphics.com/robotstxt_badbots.html "Mr. Foot, this is Mr. Bullet." Do you really want to commit DoS on your clients' users? From that list: User-agent: Wget User-agent: asterias User-agent: httplib User-agent: Wget/1.6 User-agent: Wget/1.5.3 wget is either the first or second (after curl) most popular command-line based web retrieval tool, while httplib is Python's generic retrieval tool *library* and is probably incorporated in a number of innocuous applications, and asterias is probably based on http://asterias.bioinfo.cnio.es/, which is a distributed tool for analyzing DNA IIUC. These may very well have been observed to behave as "bad bots" (but since all bots do the same thing, namely, follow every link, I don't see how you determine that!), but either their names are being spoofed or (in the case of wget) it's multiple use (can be a spider or can be an ordinary retrieval tool). If you really want to do this kind of thing, you should decide which bots you want to let in (I'm sure Google is high on your list, for example), and then restrict to those user agents and also by domain and/or IP block (or address if it's consistent).
- References:
- [tlug] Blocking unknown and unclear bots
- From: Dave M G
- Re: [tlug] Blocking unknown and unclear bots
- From: Darren Cook
- Re: [tlug] Blocking unknown and unclear bots
- From: Curt Sampson
- Re: [tlug] Blocking unknown and unclear bots
- From: Dave M G
- Re: [tlug] Blocking unknown and unclear bots
- From: Curt Sampson
- Re: [tlug] Blocking unknown and unclear bots
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Blocking unknown and unclear bots
- Next by Date: Re: [tlug] Sharp NetWalker PC-Z1(J)
- Previous by thread: Re: [tlug] Blocking unknown and unclear bots
- Next by thread: Re: [tlug] Blocking unknown and unclear bots
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links