Back when Google pulled supplemental results out of it back pocket for the world to see many webmasters where interested in what it was.
From Google’s webmaster FAQ; Google tells us supplemental results are part of Google’s auxiliary index, and that it’s a page or site that they might exclude from inclusion in their main index; however, it could still be crawled and added to their supplemental index.
Thus if we translate Google’s mumbo-jumbo we get; If Google finds a page via a link or their toolbar and they find that it is a duplicate, similar, or orphan page it gets throw out of its main index and put into its auxiliary index, which by the way is not a good thing for you as a webmaster trying to get indexed for as many keywords as you can!
Since that time webmasters have been trying to find ways to find these supplemental results via the site command.
The recent one that is being talked about is the site command with **** following the domain. I.e. site:YourDomain.com ****
At the time of this post the above site command will produce all the supplemental results for your domain, and should be noted that the above comment could and has changed over time. Other options you can type in are ** or *.*.*.
Older commands like /I, /d, and Supplement or Supplemental no longer work!
I have found sites with large amount of pages like wikipedia you can start with ** and keep adding more astricts until the number of results stop growing.
Unfortunately no one has done a deep study of why putting more astricts return more results, but from what I have seen it looks like if you add more astricts Google will show pages that have a lesser probability of being supplemental results.
I.e. ** returns only the highest offenders for inclusion in the supplemental index and ****** includes results for the highest AND all other pages to be 5 times lesser of being in the supplemental index. If you don’t get that, give the site command a run for wikipedia.org to get an idea of what I am talking about!
As we know filtered pages can hurt other pages or a site’s ranking, and we also know orphan pages suck the life out of
PR. Because of this we will want to look into removing these pages from Googles auxiliary index.
If you just had a recent site change, your supplemental results will be high because Google has not removed the old pages from it’s index. This goes the same of sites that have pages that come and go. I.e. a product website like Ebay where the product is only listed for a couple days or weeks then removed.
Cleaning Up Supplemental Results
When it comes to cleaning up supplemental results robots.txt and the meta tag robots will be your friend!
Running the site command for boogybonbon.com I found that my little blog had 24 pages listed in the supplemental results.
Some of the pages that where returned where from the old blog script dBlogger, while others are from the new blog script WordPress.
Once I managed to get past all the old pages that already had been removed I found the new blog script was duping up in a couple places as well as making some orphan pages via the RSS feeds.
As I really don’t care much for my RSS feeds to rank in the top results and that they already are supplemental results with the
PR sucked out of them, I felt they should be removed from all search results, along with other files like the archive pages as they look to much like spam sites results to search engines and they already have a null
PR.
To try and combat this I added the following to my robots file to block all spiders from accessing the join, signup, login, pages for wp-admin and to block them from the category, page, feed, and comments.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /category/
Disallow: /page/
Disallow: /feed/
Disallow: /comments/
Unfortunately the WP setup is pure crap because it puts some of the feeds in other locations and really makes it a pain to block, but for pages that are not xml’ed we can use the robots meta tag to block SE’s from indexing pages but still follow them.
Other things I noticed was $_GET forms that Google indexed from users using my polling system. Why on earth Google needs to index crap like that I don’t know, but I will know have to go hack the polling system that I downloaded to use $_POST method to keep Google from using its toolbar to index pages it should not be indexing in the first place.
By doing the above I am removing as many duplicate pages and orphan pages that I can from my site. If all plays out as planned my site should gain stronger ranking by remove pages that are questionable in Google’s eyes!
Removing Supplemental Results for Better Ranking! - Read More...