Welcome to the IMTalk - Internet Marketing & SEO Forum.
  • Login:
+ Reply to Thread
Results 1 to 16 of 16
  1. #1
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts

    Bad bots, scrapers and others eating your server resources, bandwidth, etc - Solution is here!

    If you are not suffering from this kind of bots, scraper, harvesters and others, you will sooner or later , trust me

    OK, what are these bad bots, basically bad bots are spiders from which you have more damage than good.
    They harvest your content, scrape emails, contact info, you name it and they also eat your bandwidth and server resources.

    There are few ways you can block them, like with robots.txt(yeah right ), some custom made scripts and htaccess file.
    Problem with robots.txt is that bad boots don't obey and follow what is in it,
    only good bots obey robots.txt and we don't have problems with good bots like googlebot, bingbot, slurp, ...in the first place.
    So robots.txt is just a reference to the spiders which they "might" follow.

    So next logical solution would be .htaccess file, because it is very easy to implement and configure one.
    If you don't have one in your root dir of your server just create one naming it ".htaccess", it is that easy

    Some nasty bots where "attacking" my sites,
    so I was logging all agents/spiders/scrapers, IPs into the database and ended up with nice list of bad boots and bad IPs .
    Then I compiled the htaccess code you need to copy/paste into your .htaccess file.

    On some sites in past 2 days my bandwidth usage dropped down by 10 times and resource usage also lowered drastically.

    So here is the code you need to copy/paste, please note that I have blocked Baidu and Yandex spiders as well.
    Code:
    Options -Indexes
    
    RewriteEngine on
    <IfModule mod_rewrite.c>
    
    #Block bad boots - Nemanja additions:
    #Yandex &/ Baidu - you can remove this if you wish
    RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Jakarta [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} magpie [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} August [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Pipes [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} FairShare [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Ezooms [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Konqueror [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Twingly [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Seznam [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Feed [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} alexa [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} bender [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Aghaven [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Soso [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} gooblog [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} RSS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} butterfly [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Twee [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} gnip [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Fetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} js-kit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Twit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} MetaURI [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Spinn [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} spbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ICS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Backlink [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} TheFreeDictionary [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Deepnet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} SPUTNIK [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Zend_Http_Client [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Yeti [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} WordPress [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} DigExt [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} OffByOne [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} sogo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} trendiction [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Indy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} wasa [NC,OR]
    
    deny from 178.168.31.223
    deny from 209.217.244.58
    deny from 121.74.248.22
    deny from 199.15.234.23
    deny from 188.163.107.228
    deny from 41.233.16

    Take care and block those nasty bots

    Edit: I found that code from other people are not working and have some issues so I ended up only with mine code.

  2. The Following 10 Users Say Thank You to Nemanja For This Useful Post:


  3. #2
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts
    Please note that this file is updated now, I removed some of the code which was making some mess.

  4. The Following 2 Users Say Thank You to Nemanja For This Useful Post:


  5. #3
    Johnr is offline Banned Johnr is on a distinguished road
    Join Date
    Nov 2011
    Posts
    20
    Thanks Given
    0
    Thanked 2 Times in 1 Post
    Thanks for sharing such kind of information

  6. The Following 2 Users Say Thank You to Johnr For This Useful Post:


  7. #4
    samples is offline IM & SEO Quiet One samples is on a distinguished road
    Join Date
    Nov 2011
    Posts
    3
    Thanks Given
    0
    Thanked 2 Times in 1 Post
    thank you great info

  8. The Following 2 Users Say Thank You to samples For This Useful Post:


  9. #5
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts
    Just to update on this..
    On some sites bandwidth dropped 20 times

  10. The Following 2 Users Say Thank You to Nemanja For This Useful Post:


  11. #6
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts
    Last day and a half my server was facing some high load.
    I was searching for the issue without any luck, but finally I found out what was causing it.
    It was same nasty bot/spider:
    41.233.16.139 : host-41.233.16.139.tedata.net
    so I have blocked the whole range of that class c IP in htaccess file:
    "deny from 41.233.16"

    I think that there are more bots like this out there, so when I found more I will post them here.
    In a meanwhile be sure to block them, because they are eating your resources.

    Code from above is edited also.

  12. The Following 3 Users Say Thank You to Nemanja For This Useful Post:


  13. #7
    gamer is offline IM & SEO Whisperer gamer is on a distinguished road
    Join Date
    Nov 2011
    Posts
    41
    Thanks Given
    7
    Thanked 3 Times in 2 Posts
    you are great, thanks ! Yet I'm not under bot attack, but I inserted the list in my htaccess for security.

  14. The Following 2 Users Say Thank You to gamer For This Useful Post:


  15. #8
    bluearrow's Avatar
    bluearrow is offline IM & SEO Chatty bluearrow is just really nice bluearrow is just really nice bluearrow is just really nice bluearrow is just really nice
    Join Date
    Feb 2011
    Location
    Mount Olympus
    Posts
    1,916
    Thanks Given
    118
    Thanked 356 Times in 262 Posts
    This little code is very useful ! Bot can be real pests if you run your own vps or dedis. Thank Nemanja.

  16. The Following 2 Users Say Thank You to bluearrow For This Useful Post:


  17. #9
    stronywwwlublin's Avatar
    stronywwwlublin is offline IM & SEO Small Talker stronywwwlublin has a spectacular aura about stronywwwlublin has a spectacular aura about stronywwwlublin has a spectacular aura about
    Join Date
    Jun 2011
    Posts
    601
    Thanks Given
    77
    Thanked 192 Times in 106 Posts
    Very nice snippet! Please add also 87.250.224.0 Yandex IP.

    Title of thread reminds me Bob Marley song like "bad bots, bad bots, what you gonna do..." hahah

  18. The Following 3 Users Say Thank You to stronywwwlublin For This Useful Post:


  19. #10
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts
    Hahahahah

  20. The Following 2 Users Say Thank You to Nemanja For This Useful Post:


  21. #11
    Phoenyx's Avatar
    Phoenyx is offline Super Moderator Phoenyx is a name known to all Phoenyx is a name known to all Phoenyx is a name known to all Phoenyx is a name known to all Phoenyx is a name known to all Phoenyx is a name known to all
    Join Date
    Jan 2011
    Location
    South Africa
    Posts
    3,158
    Thanks Given
    608
    Thanked 542 Times in 412 Posts
    Quote Originally Posted by stronywwwlublin View Post
    Very nice snippet! Please add also 87.250.224.0 Yandex IP.

    Title of thread reminds me Bob Marley song like "bad bots, bad bots, what you gonna do..." hahah
    Lmao, inner circle also made a version
    Kanashimi no oto ga kikoenai you na sewashii sekai ga.
    IMT Forum Member Groups, Features, Options & Permissions and IMT Forum Rules.


  22. The Following 2 Users Say Thank You to Phoenyx For This Useful Post:


  23. #12
    Nemanja's Avatar
    Nemanja is offline IMTalk.org Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute Nemanja has a reputation beyond repute
    Join Date
    Jun 2010
    Location
    www
    Posts
    4,870
    Thanks Given
    703
    Thanked 3,044 Times in 1,247 Posts
    One more thing I like to add, if you use code in your htaccess that looks like this(as from above):
    "RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]"
    then every user agent with "Yandex" in his name will be blocked,
    so some restrictions like:
    "RewriteCond %{HTTP_USER_AGENT} Feed [NC,OR]"
    can block maybe some bot or visitor with "Feed" in his user agent description.
    To avoid this you can either use more specific word for each bot,
    or you can use "^Feed" which will restrict user agent having name started with "Feed"

  24. The Following 2 Users Say Thank You to Nemanja For This Useful Post:


  25. #13
    seobunny Guest
    Hello Nemanja... is this List up to date?

    Or can you repost your newest htacces file please.

  26. #14
    glitterasia is offline IM & SEO Quiet One glitterasia is on a distinguished road
    Join Date
    Sep 2011
    Posts
    4
    Thanks Given
    0
    Thanked 0 Times in 0 Posts
    this code not working

  27. #15
    anthonymcauley is offline IM & SEO Weak Jaw anthonymcauley will become famous soon enough
    Join Date
    Feb 2011
    Posts
    244
    Thanks Given
    29
    Thanked 54 Times in 42 Posts
    Very good Nemanja. I'll load it up and block the bad babies

  28. #16
    chris is offline IM & SEO Quiet One chris is on a distinguished road
    Join Date
    Jan 2011
    Posts
    7
    Thanks Given
    5
    Thanked 0 Times in 0 Posts
    @ seobunny...is the list up to date...?


 

Similar Threads

  1. Replies: 44
    Last Post: 02-02-2012, 04:59 PM
  2. Bots or real human visitor?
    By franberries in forum Web Hosting, Servers & Domains
    Replies: 9
    Last Post: 10-31-2011, 06:22 AM
  3. Good seo resources
    By nescio in forum General SEO Talk
    Replies: 12
    Last Post: 08-08-2011, 03:28 AM
  4. Bad Bots List
    By apexglm in forum Other Search Engines
    Replies: 2
    Last Post: 06-24-2011, 07:29 AM
  5. which bots do you allow to crawl your site
    By apexglm in forum Other Search Engines
    Replies: 10
    Last Post: 05-11-2011, 12:56 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts