Welcome to the IMTalk - Internet Marketing & SEO Forum.
  • Login:
+ Reply to Thread
Results 1 to 11 of 11
  1. #1
    apexglm's Avatar
    apexglm is offline IM & SEO Mumbler apexglm will become famous soon enough
    Join Date
    Apr 2011
    Posts
    296
    Thanks Given
    10
    Thanked 37 Times in 30 Posts

    which bots do you allow to crawl your site

    goes without saying you allow google,bing,yahoo
    but what others do you allow and disallow
    some examples that have visited one of my blogs in the last week not saying i would allow them all
    larbin2.6.3@u
    baiduspider
    mj12bot
    mlbot
    tweetedtimes
    paperlibot
    catchbot
    0.83
    exabot
    gigabot
    ichiro
    yandexbot
    aghaven
    twitmunin
    pycurl
    6.0b
    jakarta
    geourl
    butterfly
    sindice-fetcher
    y!j-bsc
    blogpulselive
    birubot
    wikiwix-bot-3.0
    voyager
    ia_archiver

  2. #2
    seobunny Guest


    Code:
    User-agent: Googlebot-Image
    Allow: /
    
    User-agent: Mediapartners-Google
    Allow: /
    
    User-agent: duggmirror
    Disallow: /
    
    User-agent: grub-client
    Disallow: /
    
    User-agent: grub
    Disallow: /
    
    User-agent: looksmart
    Disallow: /
    
    User-agent: WebZip
    Disallow: /
    
    User-agent: larbin
    Disallow: /
    
    User-agent: b2w/0.1
    Disallow: /
    
    User-agent: psbot
    Disallow: /
    
    User-agent: Python-urllib
    Disallow: /
    
    User-agent: NetMechanic
    Disallow: /
    
    User-agent: URL_Spider_Pro
    Disallow: /
    
    User-agent: CherryPicker
    Disallow: /
    
    User-agent: EmailCollector
    Disallow: /
    
    User-agent: EmailSiphon
    Disallow: /
    
    User-agent: WebBandit
    Disallow: /
    
    User-agent: EmailWolf
    Disallow: /
    
    User-agent: ExtractorPro
    Disallow: /
    
    User-agent: CopyRightCheck
    Disallow: /
    
    User-agent: Crescent
    Disallow: /
    
    User-agent: SiteSnagger
    Disallow: /
    
    User-agent: ProWebWalker
    Disallow: /
    
    User-agent: CheeseBot
    Disallow: /
    
    User-agent: LNSpiderguy
    Disallow: /
    
    User-agent: ia_archiver
    Disallow: /
    
    User-agent: ia_archiver/1.6
    Disallow: /
    
    User-agent: Teleport
    Disallow: /
    
    User-agent: TeleportPro
    Disallow: /
    
    User-agent: MIIxpc
    Disallow: /
    
    User-agent: Telesoft
    Disallow: /
    
    User-agent: Website Quester
    Disallow: /
    
    User-agent: moget/2.1
    Disallow: /
    
    User-agent: WebZip/4.0
    Disallow: /
    
    User-agent: WebStripper
    Disallow: /
    
    User-agent: WebSauger
    Disallow: /
    
    User-agent: WebCopier
    Disallow: /
    
    User-agent: NetAnts
    Disallow: /
    
    User-agent: Mister PiX
    Disallow: /
    
    User-agent: WebAuto
    Disallow: /
    
    User-agent: TheNomad
    Disallow: /
    
    User-agent: WWW-Collector-E
    Disallow: /
    
    User-agent: RMA
    Disallow: /
    
    User-agent: libWeb/clsHTTP
    Disallow: /
    
    User-agent: asterias
    Disallow: /
    
    User-agent: httplib
    Disallow: /
    
    User-agent: turingos
    Disallow: /
    
    User-agent: spanner
    Disallow: /
    
    User-agent: InfoNaviRobot
    Disallow: /
    
    User-agent: Harvest/1.5
    Disallow: /
    
    User-agent: Bullseye/1.0
    Disallow: /
    
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /
    
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    
    User-agent: CherryPickerSE/1.0
    Disallow: /
    
    User-agent: CherryPickerElite/1.0
    Disallow: /
    
    User-agent: WebBandit/3.50
    Disallow: /
    
    User-agent: NICErsPRO
    Disallow: /
    
    User-agent: Microsoft URL Control – 5.01.4511
    Disallow: /
    
    User-agent: DittoSpyder
    Disallow: /
    
    User-agent: Foobot
    Disallow: /
    
    User-agent: WebmasterWorldForumBot
    Disallow: /
    
    User-agent: SpankBot
    Disallow: /
    
    User-agent: BotALot
    Disallow: /
    
    User-agent: lwp-trivial/1.34
    Disallow: /
    
    User-agent: lwp-trivial
    Disallow: /
    
    User-agent: BunnySlippers
    Disallow: /
    
    User-agent: Microsoft URL Control – 6.00.8169
    Disallow: /
    
    User-agent: URLy Warning
    Disallow: /
    
    User-agent: Wget/1.6
    Disallow: /
    
    User-agent: Wget/1.5.3
    Disallow: /
    
    User-agent: Wget
    Disallow: /
    
    User-agent: LinkWalker
    Disallow: /
    
    User-agent: cosmos
    Disallow: /
    
    User-agent: moget
    Disallow: /
    
    User-agent: hloader
    Disallow: /
    
    User-agent: humanlinks
    Disallow: /
    
    User-agent: LinkextractorPro
    Disallow: /
    
    User-agent: Offline Explorer
    Disallow: /
    
    User-agent: Mata Hari
    Disallow: /
    User-agent: grub-client
    Disallow: /
    
    User-agent: grub
    Disallow: /
    
    User-agent: looksmart
    Disallow: /
    
    User-agent: WebZip
    Disallow: /
    
    User-agent: larbin
    Disallow: /
    
    User-agent: b2w/0.1
    Disallow: /
    
    User-agent: psbot
    Disallow: /
    
    User-agent: Python-urllib
    Disallow: /
    
    User-agent: NetMechanic
    Disallow: /
    
    User-agent: URL_Spider_Pro
    Disallow: /
    
    User-agent: CherryPicker
    Disallow: /
    
    User-agent: EmailCollector
    Disallow: /
    
    User-agent: EmailSiphon
    Disallow: /
    
    User-agent: WebBandit
    Disallow: /
    
    User-agent: EmailWolf
    Disallow: /
    
    User-agent: ExtractorPro
    Disallow: /
    
    User-agent: CopyRightCheck
    Disallow: /
    
    User-agent: Crescent
    Disallow: /
    
    User-agent: SiteSnagger
    Disallow: /
    
    User-agent: ProWebWalker
    Disallow: /
    
    User-agent: CheeseBot
    Disallow: /
    
    User-agent: LNSpiderguy
    Disallow: /
    
    User-agent: ia_archiver
    Disallow: /
    
    User-agent: ia_archiver/1.6
    Disallow: /
    
    User-agent: Teleport
    Disallow: /
    
    User-agent: TeleportPro
    Disallow: /
    
    User-agent: MIIxpc
    Disallow: /
    
    User-agent: Telesoft
    Disallow: /
    
    User-agent: Website Quester
    Disallow: /
    
    User-agent: moget/2.1
    Disallow: /
    
    User-agent: WebAuto
    Disallow: /
    
    User-agent: WebZip/4.0
    Disallow: /
    
    User-agent: WebStripper
    Disallow: /
    
    User-agent: WebSauger
    Disallow: /
    
    User-agent: WebCopier
    Disallow: /
    
    User-agent: NetAnts
    Disallow: /
    
    User-agent: Mister PiX
    Disallow: /
    
    User-agent: TheNomad
    Disallow: /
    
    User-agent: WWW-Collector-E
    Disallow: /
    
    User-agent: RMA
    Disallow: /
    
    User-agent: libWeb/clsHTTP
    Disallow: /
    
    User-agent: asterias
    Disallow: /
    
    User-agent: httplib
    Disallow: /
    
    User-agent: turingos
    Disallow: /
    
    User-agent: spanner
    Disallow: /
    
    User-agent: InfoNaviRobot
    Disallow: /
    
    User-agent: Harvest/1.5
    Disallow: /
    
    User-agent: Bullseye/1.0
    Disallow: /
    
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /
    
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    
    User-agent: CherryPickerSE/1.0
    Disallow: /
    
    User-agent: CherryPickerElite/1.0
    Disallow: /
    
    User-agent: WebBandit/3.50
    Disallow: /
    
    User-agent: NICErsPRO
    Disallow: /
    
    User-agent: Microsoft URL Control – 5.01.4511
    Disallow: /
    
    User-agent: DittoSpyder
    Disallow: /
    
    User-agent: Foobot
    Disallow: /
    
    User-agent: WebmasterWorldForumBot
    Disallow: /
    
    User-agent: SpankBot
    Disallow: /
    
    User-agent: BotALot
    Disallow: /
    
    User-agent: lwp-trivial/1.34
    Disallow: /
    
    User-agent: lwp-trivial
    Disallow: /
    
    User-agent: BunnySlippers
    Disallow: /
    
    User-agent: Microsoft URL Control – 6.00.8169
    Disallow: /
    
    User-agent: URLy Warning
    Disallow: /
    
    User-agent: Wget/1.6
    Disallow: /
    
    User-agent: Wget/1.5.3
    Disallow: /
    
    User-agent: Wget
    Disallow: /
    
    User-agent: LinkWalker
    Disallow: /
    
    User-agent: cosmos
    Disallow: /
    
    User-agent: moget
    Disallow: /
    
    User-agent: hloader
    Disallow: /
    
    User-agent: humanlinks
    Disallow: /
    
    User-agent: LinkextractorPro
    Disallow: /
    
    User-agent: Offline Explorer
    Disallow: /
    
    User-agent: Mata Hari
    Disallow: /
    
    User-agent: BuiltBotTough
    Disallow: /
    
    User-agent: LexiBot
    Disallow: /
    
    User-agent: Web Image Collector
    Disallow: /
    
    User-agent: The Intraformant
    Disallow: /
    
    User-agent: True_Robot/1.0
    Disallow: /
    
    User-agent: True_Robot
    Disallow: /
    
    User-agent: BlowFish/1.0
    Disallow: /
    
    User-agent: JennyBot
    Disallow: /
    
    User-agent: MIIxpc/4.2
    Disallow: /
    
    User-agent: ProPowerBot/2.14
    Disallow: /
    
    User-agent: BackDoorBot/1.0
    Disallow: /
    
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    
    User-agent: WebEnhancer
    Disallow: /
    
    User-agent: suzuran
    Disallow: /
    
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    
    User-agent: VCI
    Disallow: /
    
    User-agent: Szukacz/1.4
    Disallow: /
    
    User-agent: QueryN Metasearch
    Disallow: /
    
    User-agent: Openfind data gathere
    Disallow: /
    
    User-agent: Openfind
    Disallow: /
    
    User-agent: Xenu’s Link Sleuth 1.1c
    Disallow: /
    
    User-agent: Xenu’s
    Disallow: /
    
    User-agent: Zeus
    Disallow: /
    
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    
    User-agent: RepoMonkey
    Disallow: /
    
    User-agent: Microsoft URL Control
    Disallow: /
    
    User-agent: Openbot
    Disallow: /
    
    User-agent: URL Control
    Disallow: /
    
    User-agent: Zeus Link Scout
    Disallow: /
    
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    
    User-agent: EroCrawler
    Disallow: /
    
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    
    User-agent: Keyword Density/0.9
    Disallow: /
    
    User-agent: Kenjin Spider
    Disallow: /
    
    User-agent: Iron33/1.0.2
    Disallow: /
    
    User-agent: Bookmark search tool
    Disallow: /
    
    User-agent: GetRight/4.2
    Disallow: /
    
    User-agent: FairAd Client
    Disallow: /
    
    User-agent: Gaisbot
    Disallow: /
    
    User-agent: Aqua_Products
    Disallow: /
    
    User-agent: Radiation Retriever 1.1
    Disallow: /
    
    User-agent: Webster Pro
    Disallow: /
    
    User-agent: Flaming AttackBot
    Disallow: /
    
    User-agent: Oracle Ultra Search
    Disallow: /
    
    User-agent: MSIECrawler
    Disallow: /
    
    User-agent: PerMan
    Disallow: /
    
    User-agent: searchpreview
    Disallow: /
    
    User-agent: LexiBot
    Disallow: /
    
    User-agent: Web Image Collector
    Disallow: /
    
    User-agent: The Intraformant
    Disallow: /
    
    User-agent: True_Robot/1.0
    Disallow: /
    
    User-agent: True_Robot
    Disallow: /
    
    User-agent: BlowFish/1.0
    Disallow: /
    
    User-agent: JennyBot
    Disallow: /
    
    User-agent: MIIxpc/4.2
    Disallow: /
    
    User-agent: BuiltBotTough
    Disallow: /
    
    User-agent: ProPowerBot/2.14
    Disallow: /
    
    User-agent: BackDoorBot/1.0
    Disallow: /
    
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    
    User-agent: WebEnhancer
    Disallow: /
    
    User-agent: suzuran
    Disallow: /
    
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    
    User-agent: VCI
    Disallow: /
    
    User-agent: Szukacz/1.4
    Disallow: /
    
    User-agent: QueryN Metasearch
    Disallow: /
    
    User-agent: Openfind data gathere
    Disallow: /
    
    User-agent: Openfind
    Disallow: /
    
    User-agent: Xenu’s Link Sleuth 1.1c
    Disallow: /
    
    User-agent: Xenu’s
    Disallow: /
    
    User-agent: Zeus
    Disallow: /
    
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    
    User-agent: RepoMonkey
    Disallow: /
    
    User-agent: Microsoft URL Control
    Disallow: /
    
    User-agent: Openbot
    Disallow: /
    
    User-agent: URL Control
    Disallow: /
    
    User-agent: searchpreview
    Disallow: /
    
    User-agent: Zeus Link Scout
    Disallow: /
    
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    
    User-agent: Webster Pro
    Disallow: /
    
    User-agent: EroCrawler
    Disallow: /
    
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    
    User-agent: Keyword Density/0.9
    Disallow: /
    
    User-agent: Kenjin Spider
    Disallow: /
    
    User-agent: Iron33/1.0.2
    Disallow: /
    
    User-agent: Bookmark search tool
    Disallow: /
    
    User-agent: GetRight/4.2
    Disallow: /
    
    User-agent: FairAd Client
    Disallow: /
    
    User-agent: Gaisbot
    Disallow: /
    
    User-agent: Aqua_Products
    Disallow: /
    
    User-agent: Radiation Retriever 1.1
    Disallow: /
    
    User-agent: Flaming AttackBot
    Disallow: /
    
    User-agent: Oracle Ultra Search
    Disallow: /
    
    User-agent: MSIECrawler
    Disallow: /
    
    User-agent: PerMan
    Disallow: /

  3. The Following User Says Thank You to seobunny For This Useful Post:


  4. #3
    apexglm's Avatar
    apexglm is offline IM & SEO Mumbler apexglm will become famous soon enough
    Join Date
    Apr 2011
    Posts
    296
    Thanks Given
    10
    Thanked 37 Times in 30 Posts
    nice list seobunny
    find that a lot of people overlook this far too easy,not every bot is your friend
    i scrutinize every bot that lands on my sites

  5. #4
    wink0r's Avatar
    wink0r is offline Moderator wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold
    Join Date
    Oct 2010
    Location
    East Coast, USA
    Posts
    2,183
    Thanks Given
    941
    Thanked 651 Times in 491 Posts
    The worst of them probably pay no attention to the robots.txt file. It is completely optional for the spiders, not an iron gate. You will cut some bandwidth use by using the robots.txt file, though.

  6. #5
    apexglm's Avatar
    apexglm is offline IM & SEO Mumbler apexglm will become famous soon enough
    Join Date
    Apr 2011
    Posts
    296
    Thanks Given
    10
    Thanked 37 Times in 30 Posts
    me personally i ban them via ip range from my server as its easier for me as i have around 30 or so sites running there

  7. #6
    rain22's Avatar
    rain22 is offline IM & SEO Chatty rain22 has a spectacular aura about rain22 has a spectacular aura about
    Join Date
    Feb 2011
    Location
    Russia
    Posts
    1,601
    Thanks Given
    38
    Thanked 148 Times in 112 Posts
    ooppZzz, I didn't know about these bots before. Knew only some bots like google bots,yahoo bots,baidu bots and etc !! and I didn't block any of them

  8. #7
    idreesfarooq's Avatar
    idreesfarooq is offline IM & SEO Chatty idreesfarooq is a jewel in the rough idreesfarooq is a jewel in the rough idreesfarooq is a jewel in the rough idreesfarooq is a jewel in the rough
    Join Date
    Mar 2011
    Location
    Pakistan
    Posts
    1,315
    Thanks Given
    214
    Thanked 306 Times in 213 Posts
    After reading this useful post, i have checked bots status visiting my blog today through StatPress plugin and i am surprised that today 106 spiders and 16 feed agents have visited my blog. Previously i always ignore them but now i need advice that Why should i disallow these bots? Seobunny has provided a good list and i think i will prefer to use that list rather segregating my own.

  9. #8
    rain22's Avatar
    rain22 is offline IM & SEO Chatty rain22 has a spectacular aura about rain22 has a spectacular aura about
    Join Date
    Feb 2011
    Location
    Russia
    Posts
    1,601
    Thanks Given
    38
    Thanked 148 Times in 112 Posts
    Quote Originally Posted by idreesfarooq View Post
    After reading this useful post, i have checked bots status visiting my blog today through StatPress plugin and i am surprised that today 106 spiders and 16 feed agents have visited my blog. Previously i always ignore them but now i need advice that Why should i disallow these bots? Seobunny has provided a good list and i think i will prefer to use that list rather segregating my own.
    yea, I also want to know the same, why should we block these bots from coming to our sites ? what are the advantages if we do so ?

  10. #9
    wink0r's Avatar
    wink0r is offline Moderator wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold wink0r is a splendid one to behold
    Join Date
    Oct 2010
    Location
    East Coast, USA
    Posts
    2,183
    Thanks Given
    941
    Thanked 651 Times in 491 Posts
    Some of those bots are content or email address scrapers. Those you definitely don't need. Others are providing you with copious supplies of comment spam, unless you are following parada's techniques you probably don't really need them, either. Yet others are registeration bots for various CMSs, or content delivery bots where applicable. They all add some load to your server and take some bandwidth.

  11. #10
    apexglm's Avatar
    apexglm is offline IM & SEO Mumbler apexglm will become famous soon enough
    Join Date
    Apr 2011
    Posts
    296
    Thanks Given
    10
    Thanked 37 Times in 30 Posts
    best thing i can advise is research each bot you see crawling your site and decide for yourself if you want to block it
    i could post a list here that would take you half an hour to read
    if you have a wordpress site install "Visitor Maps and Who's Online" plugin
    this will make it so easy from the admin side of wordpress to get a list of bots just change the setting at the top to include bots
    Visitor Maps and Who's Online - Wordpress Plugin PHP Script - Long Beach Weather
    rain222
    it can stop content scrapers taking your content or feeding from your rss,it can save bandwidth,stop your emails being harvested,stop bots who ignore your requests to have certain pages indexed,stop spammers and so on
    maybe a dedicated thread to bad bots would be a good idea

  12. #11
    rain22's Avatar
    rain22 is offline IM & SEO Chatty rain22 has a spectacular aura about rain22 has a spectacular aura about
    Join Date
    Feb 2011
    Location
    Russia
    Posts
    1,601
    Thanks Given
    38
    Thanked 148 Times in 112 Posts
    Quote Originally Posted by apexglm View Post
    best thing i can advise is research each bot you see crawling your site and decide for yourself if you want to block it
    i could post a list here that would take you half an hour to read
    if you have a wordpress site install "Visitor Maps and Who's Online" plugin
    this will make it so easy from the admin side of wordpress to get a list of bots just change the setting at the top to include bots
    Visitor Maps and Who's Online - Wordpress Plugin PHP Script - Long Beach Weather
    rain222
    it can stop content scrapers taking your content or feeding from your rss,it can save bandwidth,stop your emails being harvested,stop bots who ignore your requests to have certain pages indexed,stop spammers and so on
    maybe a dedicated thread to bad bots would be a good idea
    yea, it would be nice if some one who know about this subject can start a new thread to tell us what kinda bots are bad to blogs and sites


 

Similar Threads

  1. Replies: 10
    Last Post: 06-18-2011, 03:03 PM
  2. How to get Google to crawl site faster & instant
    By myindiahub in forum Google
    Replies: 31
    Last Post: 05-05-2011, 11:07 AM
  3. Website SEO Analysis Report - best site to scan your site
    By humblestudent in forum General SEO Talk
    Replies: 9
    Last Post: 04-20-2011, 02:01 PM
  4. On site SEO refers to the data you control for your web page/site
    By whitegyr in forum On-Page SEO & Content Creation
    Replies: 2
    Last Post: 03-24-2011, 11:22 AM
  5. Replies: 0
    Last Post: 11-08-2010, 05:23 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts