The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives. Failure to comply results in immediate banishment. The best part is that the Blackhole only affects bad bots: normal users never see the hidden link, and good bots obey the robots rules in the first place.
Tags: php htaccess Bots scraping
Submitted by: JDStraughan Submitted on: Jul 20, 2010
Source: http://perishablepress.com/press/2010/07/14/blackhole-bad-bots/
This Tut has no comments. Be the first to add your two cents!
Login to post comments.
ie css regex codeigniter javascript php mysql mootools framework library ci mvc forms ajax blog regularexpressions world flash jquery dom rating svn dojo xml rss facebook xhtml nav navigation menu effect plugins jqueryui widgets twitter curl ui plugin wordpress event tinyurl photoshop apache google safari python mac apple search zend

For all the newest TUTs, follow @tutlist
NeilWilston (356 Tuts)
girish (232 Tuts)
kumaraman (203 Tuts)
showkatahmad (160 Tuts)
deepti89 (147 Tuts)
Drag TUTmark to your bookmarks
to begin using our bookmarklet.
Don't know how to use a bookmarklet?
Check out this tutorial.
Home Page (RSS and ATOM)
New TUTS (RSS and ATOM)
Subscribe by Email
To learn how to get your ad here.
CONTACT US TODAY!