Robot.txt files are bad for SEO. They hurt your Google Rankings. Especially this super robot.txt for WordPress going around the internet.
I learned the hard way and want to save you from doing the same.
I was once fed the idea that if I use this super robots.txt file pasted in the code box below, that Google would listen to me, index all of my site and just ignore these sections. These sections that I do not want Google to index in the first place. I don’t want them to be indexed because they would only cause duplicate content and hurt my natural SEO.
It looks great, makes sense and should work right? I mean if you read countless blog post they all recommend you do this and stacks of noobs confirm it in the comments. So I did, and boy were they wrong.
Disallow Is Bad For WordPress SEO
If Google has found your content already(which it surely has) and you use the above robots.txt and tell the Google Bot that it can not see what’s in these pages, then Google will keep your pages up, in the search index, with a very generic excerpt as seen below.
A description for this result is not available because of this site’s robots.txt – learn more.
Horrible right? Go search Google now for site:yourdomain.com and take a look. When you get to the end of the SREPS you have to hit view omitted results to see them all. Categories, tags, query strings and more all saying the description is not available.
I was told you can use the Google Link Removal Tool for this if the link is already blocked in your robots text file. But I also see that it is possible, that if you do this and someone else out there is pointing to your link, Google will reindex it and not crawl it because of your robots.txt causing the same issue after you removed it. Who knows how long this massive crap could take because it did happen to me. Somehow all my removed materials ended up back in that state after I thought I fixed it and I let my guard down.
My Google rankings started dropping slowly over time I couldn’t seem to stop it from happening. It turns out, I just had to clear all of that disallow file junk out of my WordPress robots.txt and let Google crawl these pages naturally. I already had them set up to do a “noindex” meaning Google would see that and know not to index them. But once they are blocked from the crawler bot, they can’t get in to see that I do not want the category, archive, author and tag pages indexed, meaning I get the generic search engine result no description available page shown above. And it does hurt your SEO.
Best Way To Stop Google From Indexing Your Pages
The proper way, according to Search Engine Optimization, is to unblock it from the Robots.txt and use the noindex tag.
meta name=”robots” content=”noindex”/>
With it unblocked from robots.txt and noindexed organically in the page head, Google can now crawl the page, see your request to not index the page and remove it from Google all together.
So there you have it. Do NOT use Robots.txt to control GoogleBot Indexing your site. Use noindex for better organic SERPs and less confusion between you and bots.
If you’re wondering why you would want to do this at all, then you’re far behind in the SEO world.
Your categories and tags are for your users, to navigate your site. However a tag such as /tag/gadget/ or /category/gadget/ is full of duplicate content, the stuff that sinks sites. In noobman’s term, you want your users to be able to use them but you must remove them from Google’s index or feel the wrath of the duplicate content penalty.
Also you wouldn’t want to index your subscribe, about, contact, etc pages as they are worthless content and only meant for onsite user experience.
Fix Blocked Resources With Robot Text File
Another thing you do is block resources from Google with the robot file. You are telling the bot, no, you can not look here, yes I understand it is a key component of my page loading and the users can see it, but you can not. This will effect your search engine rankings for sure and they even tell you so on the Webmaster Tool.
As you can see from my image below, as I unblocked the paths in the file that lead to WordPress resources, my pages started opening up to being crawled.
Yes this takes quit a time, so if you’re messed up, fix the blocked resources and wait a few months.
You’ll notice I had 100 blocked pages beforehand and after I fixed it, there are only 5 remaining but they will drop out next week. It takes quite a while for them to all get resubmitted to the search engines.
WordPress Robots.txt Users
That works great for all of your static pages that you yourself coded, but what if you’re using WordPress and you have hundreds to thousands of pages to fix? How on earth would you tell it not to index your categories and tags? You could use a WordPress Plugin, which is what I do.
Wordpress SEO by Yoast has options to noindex categories, tags, media attachments, archives and author pages. All things you should be doing.
Another benefit to this is your pages will use the rel canonical link and tell Google where the real page is. Because caching programs, search strings and more often land a user on a page like domain.com/blog/post-title?w3tc
link rel=”canonical” href=”http://stickystatic.com/post-title” />
So in conclusion, stay away from the robots.txt all together or you will hurt your SEO. If it’s “bad bots” you’re worried about, well they don’t follow the robots.txt rules anyways.