SEO Help from Mr. Roboto
April 26th, 2009 - by BAWalkerDo I need a robots.txt file for my site?
When search engines make a crawl through your website, one of the first things they look for is a robots.txt file – you can think of it as a welcome file to the search engines for your site. It tells them If there are certain files or folders that you don’t want indexed. And, it can direct them to your sitemap file, which gives them a roadmap to the files on your site that you do want to be indexed.
Your robots.txt is a simple text file that you place in the root of your site and is created using your favorite html editor. If your domain is yourbusinesssite.com, then the search engines will look for yourbusinesssite.com/robots.txt.
A sample robots.txt file would look like this:
Sitemap: http://www.yourbusinessite.com/sitemap.xml
User-agent: *
Disallow:
Here’s a quick overview of each of these:
1. Sitemap – The sitemap entry helps the search engines find your sitemap file, which in turn helps the search engines indexed more of your website pages that they might find on their own. You can learn more about sitemaps by visiting here.
Although this Sitemap entry is independent from the other statements in your robots.txt file, I typically place this at the top of my file for organization and consistency.
2. User-agent – The user-agent refers to the search engine spiders. By using the * wildcard, we are saying that this entry refers to all search engine spiders and robots.
3. Disallow – The Disallow statement tells the search engines if there are any portions of your site that they should not index. In our case, by having no file or directory listed on this statement tells the search engines it’s okay to index our entire site.
Even though we do not have any restrictions to report to the search engine crawlers, just having the robots file present will prevent unecessary errors from showing in your web statistics bad referral report.
The robots.txt file can be used to pass on specific instructions to each search engine. For example, if you wanted to block Google from searching your site entirely, you can use the following syntax:
User-agent: Googlebot Disallow: /
One word of caution: Although it can be common practice to use your robots.txt file to disallow certain folders or directories from being indexed, you should keep in mind that anyone can access your robots file, and spammers sometime make it a point to check the robots.txt for excluded directories to spider.
So, in summary, you can use your Robots.txt file to let them know what you would like excluded and to point them to your sitemap file. All this can be important steps in helping improve your search engine placement.
That’s it for now, talk soon!
Betty Walker, Founder & CEO
CyberCompany Solutions, Inc.
http://www.ccstechpros.com
bawalker@ccs-email.com







