Latest Post -
  • The Top 5 Best Cities For Possibility
  • Understanding Robots.txt

    25 April 2012

    Unless you’ve got an IT professional on hand at all times, small business owners more frequently than not have to awkwardly don the hat of a web expert to keep their websites up and running.  Even when the biggest publically traded companies have a fulltime staff of the top information technology professionals on hand, the common issue brought on by the robots.txt issue still happens.  Becoming invisible to search engines is just about as bad as it gets in the great World Wide Web.

    The Basics:

    What are Search Engine Spiders?

    These sneaky devils are the informative bits that search your website for content marked as available for web robots to retrieve and appropriately rank for the searcher.  These spiders or web crawlers essentially seek out information not masked by the robots.txt format.

    How Does a Robots.txt Blockage Come About?

    The usage of robots.txt formatting is most commonly used for staging servers.  If you find yourself at the mercy of a robots.txt problem, it likely stems from when your staging server was rolled over to the live server.  Web developers utilize the robots.txt format to prevent the duplication of your web content during the building process and when your site does eventually go live.

    How To Check Your Site for Robots.txt

    You are able to manually check your website to rule out the possibility that is suffering from the effects of an inappropriately placed robots.txt setting. No need to panic over the possibility of being Google blacklisted, keep calm and check the following simple steps:

    • Enter your domain name followed by a backslash and robots.txt in the address bar. For example: http://thedomainname.com/robots.txt
    • If a 404-error page is the result, then you may not have the robots.txt feature.
    • An additional route would be to log into your Google Webmaster Tools page to tell you which URLs include a robots.txt file restriction.
    • If your robots.txt file shows:

    User-agent: *
    Disallow: /

    You’ll need to be sure to make changes.  You should never see the above coding on a live website.

    How To Prevent Parts of Your Site From Being Indexed

    Robot.txt can actually work to serve you just as they can hurt your website. To essentially hide certain sections of your website from these spiders or web crawlers, you can implement the features of the robots.txt formatting.  To disallow ads or log files from being searched on your website, these pages or features should be respectively coded:

    User-agent: *
    Disallow: /ads
    Disallow: /logs

    • Unfortunately, the usage of the robots.txt isn’t a cure-all for those items you wouldn’t like searched. You may also notice the blanket effect of this feature. Basic protocol doesn’t allow for Wildcards in the Disallow line or “Allow:” lines.  Subsequently, Google has expanded this basic format issue to allow both of these options, but these are not universally accepted, so it is recommended that these expansions ONLY be used for a “User-agent:” run by Google.

    Does the Robots.txt Prevent Users From Viewing Certain Content?

    Absolutely not.  Adding the robots.txt to your web coding will only prevent web-screening spiders from selecting content from these portions of your site.  All content will be left for the viewing pleasure of all visitors to that page and will be completely unaware of the robots.txt status of the content on that page.  In all honesty the robots.txt will only disallow “polite” spiders from access to the information, in reality there are likely less well-mannered searchers weaving through that data.

    If you really want to protect certain data, content or certain sections of your website, your best bet is to password protect these areas. Also remember that if you want content officially removed from the index, you must include a robots no index meta tag on each and every page you want to unequivocally remove from the index of your site.

    Understanding the slightly more simplistic features of running and maintaining your website will likely save you money on the front and the running end of your business.  If you find that your website has disappeared from Google search or is extremely hard to find otherwise, your first step should be to double-check your robots.txt.  No need to spend extra money on a tech professional when you are well equipped to rule out the easy fixes and get back to the world of the living as far as the web is concerned!

    Matthew Toren is an Award Winning Author, Serial Entrepreneur, and Investor. He Co-Founded YoungEntrepreneur.com along with his brother Adam. Matthew is co-author of the newly released book: Small Business, Big Vision: “Lessons on How to Dominate Your Market from Self-Made Entrepreneurs Who Did it Right” and also co-author of Kidpreneurs.

     

     

     

    , , , , , , ,

    17 Responses to Understanding Robots.txt

    1. Jess Day April 25, 2012 at 7:41 pm #

      This is one great article especially for people who do have heard about it but do not understand what it is all about. When you want to stay afloat in this highly competitive online world, you need to be aware of certain business strategies and use the right tools. I am sure that many small business owners are not completely aware of how indexing works and how it affects the business. Thanks and cheers!

      • Matthew T April 26, 2012 at 12:27 pm #

        @Jess Thank you for your comment. You are right about small biz owners not being aware of how indexing works. It sure is important!

    2. TheFireGuy April 25, 2012 at 9:50 pm #

      I had no idea that these “spiders” even existed. Good info to know!

      • Matthew T April 26, 2012 at 12:30 pm #

        @TheFireGuy Thank you for the post. Glad we can help out. Cheers!

    3. Charina Fernandez April 26, 2012 at 1:25 am #

      Thank you for posting such a great article. I’m am very sure that lot of people will benefit from this article because it gives a lot of insightful information especially to those who are engage in a work from internet. Not everyone are aware about this robots and on what they are able to do our websites. With your post most business owners are probably educated by now and will take action on how to get off with these robots.

      • Matthew T April 26, 2012 at 12:58 pm #

        @Charina Thank you for your wonderful comments. Our mission has always been to keep our visitors up to date on the important items that will benefit their small business now and in the future.

    4. Mohammad Afnan April 26, 2012 at 10:30 am #

      Thanks for sharing this informative post.I didn’t know much about robot.txt

      • Matthew T April 26, 2012 at 1:00 pm #

        @Mohammed Thank you for taking the time to post! Please let us know what other articles you would like to see on our blog.

    5. Bill April 26, 2012 at 2:35 pm #

      This is a great post! I will be sharing this with my students. They’ll benefit and appreciate this post.

    6. Nicole April 26, 2012 at 6:33 pm #

      I just read your article on understanding robot’s txt and just wanted to say what a nice job you did explaining it. This seems to be an area that often times intimidates web owners and it really should not. Thank you sharing and I have enjoyed reading the your content.

    7. Oscar April 29, 2012 at 11:43 pm #

      Great post and information about robot txt files. Great for new website owners just starting out. Thanks for sharing a really useful post! Many newbies will find this very helpful.

      • Matthew T April 30, 2012 at 11:22 am #

        Thank you for taking the time to post. Cheers!

    8. John May 1, 2012 at 8:28 am #

      Good write up about robot txt files, This post has helped me to learn something extra. Thanks

    9. Georgi Vasilev May 2, 2012 at 11:42 am #

      Very useful article. I just wanted to share with that there is a plugin for WordPress that makes thing easier. Plugin name is SEO by Yoast. In the settings you can choose for each post/page the following options:
      Index/No index
      Follow/No follow

    10. Maya May 7, 2012 at 10:00 pm #

      Nice work Matthew! For people engaged with online marketing robots.txt is not new but only a few knows what it is for. Your post gives an overview of how robots.txt can help and how essential they are in the process of online marketing.

    11. Kelsey May 14, 2012 at 4:37 pm #

      I’m surprised by how many people don’t know about the benefits/purpose of the robots.txt file.

    12. Paul March 15, 2013 at 7:52 pm #

      good stuff

    Leave a Reply