Welcome to the IMTalk - Internet Marketing & SEO Forum.
  • Login:
+ Reply to Thread
Results 1 to 2 of 2
  1. #1
    Hajoless is offline Banned Hajoless is on a distinguished road
    Join Date
    Apr 2011
    Posts
    117
    Thanks Given
    0
    Thanked 3 Times in 2 Posts

    Robots.txt and extensions

    I want to disallow all HTML pages on my site in robots.txt - accually I want to disallow all pages with html extension

    Is there any way to do that?

  2. #2
    gazraz is offline IM & SEO Quiet One gazraz is on a distinguished road
    Join Date
    Aug 2011
    Posts
    22
    Thanks Given
    0
    Thanked 1 Time in 1 Post
    The details

    The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions:

    The short answer: in the top-level directory of your web server.

    The longer answer:

    When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

    For example, for "example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "example.com/robots.txt".

    So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

    Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

    See also:

    What program should I use to create /robots.txt?
    How do I use /robots.txt on a virtual host?
    How do I use /robots.txt on a shared host?

    What to put in it
    The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /~joe/

    In this example, three directories are excluded.

    Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

    Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

    What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:
    To exclude all robots from the entire server

    User-agent: *
    Disallow: /


    To allow all robots complete access

    User-agent: *
    Disallow:

    (or just create an empty "/robots.txt" file, or don't use one at all)
    To exclude all robots from part of the server

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /junk/

    To exclude a single robot

    User-agent: BadBot
    Disallow: /

    To allow a single robot

    User-agent: Google
    Disallow:

    User-agent: *
    Disallow: /

    To exclude all files except one
    This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:

    User-agent: *
    Disallow: /~joe/stuff/

    Alternatively you can explicitly disallow all disallowed pages:

    User-agent: *
    Disallow: /~joe/junk.html
    Disallow: /~joe/foo.html
    Disallow: /~joe/bar.html

    the other way maybe htaccess file.


 

Similar Threads

  1. How can I developed Joomla Seo Extensions?
    By rcdahal in forum Website Planning, Design & Development
    Replies: 0
    Last Post: 06-26-2011, 06:56 AM
  2. Twitter's robots.txt question:
    By Hema in forum Social Networks & Community Websites
    Replies: 0
    Last Post: 12-09-2010, 09:42 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts