If you are not suffering from this kind of bots, scraper, harvesters and others, you will sooner or later , trust me
OK, what are these bad bots, basically bad bots are spiders from which you have more damage than good.
They harvest your content, scrape emails, contact info, you name it and they also eat your bandwidth and server resources.
There are few ways you can block them, like with robots.txt(yeah right ), some custom made scripts and htaccess file.
Problem with robots.txt is that bad boots don't obey and follow what is in it,
only good bots obey robots.txt and we don't have problems with good bots like googlebot, bingbot, slurp, ...in the first place.
So robots.txt is just a reference to the spiders which they "might" follow.
So next logical solution would be .htaccess file, because it is very easy to implement and configure one.
If you don't have one in your root dir of your server just create one naming it ".htaccess", it is that easy
Some nasty bots where "attacking" my sites,
so I was logging all agents/spiders/scrapers, IPs into the database and ended up with nice list of bad boots and bad IPs .
Then I compiled the htaccess code you need to copy/paste into your .htaccess file.
On some sites in past 2 days my bandwidth usage dropped down by 10 times and resource usage also lowered drastically.
So here is the code you need to copy/paste, please note that I have blocked Baidu and Yandex spiders as well.
Code:Options -Indexes RewriteEngine on <IfModule mod_rewrite.c> #Block bad boots - Nemanja additions: #Yandex &/ Baidu - you can remove this if you wish RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR] RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR] RewriteCond %{HTTP_USER_AGENT} Jakarta [NC,OR] RewriteCond %{HTTP_USER_AGENT} magpie [NC,OR] RewriteCond %{HTTP_USER_AGENT} August [NC,OR] RewriteCond %{HTTP_USER_AGENT} Pipes [NC,OR] RewriteCond %{HTTP_USER_AGENT} FairShare [NC,OR] RewriteCond %{HTTP_USER_AGENT} Ezooms [NC,OR] RewriteCond %{HTTP_USER_AGENT} Konqueror [NC,OR] RewriteCond %{HTTP_USER_AGENT} Twingly [NC,OR] RewriteCond %{HTTP_USER_AGENT} Seznam [NC,OR] RewriteCond %{HTTP_USER_AGENT} Feed [NC,OR] RewriteCond %{HTTP_USER_AGENT} alexa [NC,OR] RewriteCond %{HTTP_USER_AGENT} bender [NC,OR] RewriteCond %{HTTP_USER_AGENT} Aghaven [NC,OR] RewriteCond %{HTTP_USER_AGENT} Soso [NC,OR] RewriteCond %{HTTP_USER_AGENT} gooblog [NC,OR] RewriteCond %{HTTP_USER_AGENT} RSS [NC,OR] RewriteCond %{HTTP_USER_AGENT} butterfly [NC,OR] RewriteCond %{HTTP_USER_AGENT} Twee [NC,OR] RewriteCond %{HTTP_USER_AGENT} gnip [NC,OR] RewriteCond %{HTTP_USER_AGENT} Fetch [NC,OR] RewriteCond %{HTTP_USER_AGENT} js-kit [NC,OR] RewriteCond %{HTTP_USER_AGENT} Twit [NC,OR] RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR] RewriteCond %{HTTP_USER_AGENT} Spinn [NC,OR] RewriteCond %{HTTP_USER_AGENT} spbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ICS [NC,OR] RewriteCond %{HTTP_USER_AGENT} Backlink [NC,OR] RewriteCond %{HTTP_USER_AGENT} TheFreeDictionary [NC,OR] RewriteCond %{HTTP_USER_AGENT} Deepnet [NC,OR] RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR] RewriteCond %{HTTP_USER_AGENT} SPUTNIK [NC,OR] RewriteCond %{HTTP_USER_AGENT} Zend_Http_Client [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yeti [NC,OR] RewriteCond %{HTTP_USER_AGENT} WordPress [NC,OR] RewriteCond %{HTTP_USER_AGENT} DigExt [NC,OR] RewriteCond %{HTTP_USER_AGENT} OffByOne [NC,OR] RewriteCond %{HTTP_USER_AGENT} sogo [NC,OR] RewriteCond %{HTTP_USER_AGENT} trendiction [NC,OR] RewriteCond %{HTTP_USER_AGENT} Indy [NC,OR] RewriteCond %{HTTP_USER_AGENT} wasa [NC,OR] deny from 178.168.31.223 deny from 209.217.244.58 deny from 121.74.248.22 deny from 199.15.234.23 deny from 188.163.107.228 deny from 41.233.16
Take care and block those nasty bots
Edit: I found that code from other people are not working and have some issues so I ended up only with mine code.