Revenue Source

Welcome to the Revenue Source affiliate marketing forums.

You are viewing our internet marketing and SEO forums as a guest which gives you limited access to most of our discussions.  By joining our free community, you will have access to post affiliate marketing topics, communicate privately with other members (PM), exchange SEO strategies, and access many other special features.  Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems, please don't hesitate to contact us.

Go Back   Revenue Source > Affiliate Marketing Hangout > Internet Marketing Articles > SEO / SEM
Reload this Page I agree to disagree on duplicate content!
Tags: , , ,

Reply
 
LinkBack Thread Tools Search this Thread
Old
  (#1 (permalink))
SEO Blogs is Offline
Revenue Source Veteran
SEO Blogs is worth a listen.SEO Blogs is worth a listen.SEO Blogs is worth a listen.
 
SEO Blogs's Avatar
 
Join Date: Jul 2005
Posts: 817
   
I agree to disagree on duplicate content! - 11-15-2006

I have known G-Man for couple years, and for the most part we always agree on 99.9999% on everything that we talk about in SEO, SEM, etc.
Over the last week G-Man and I got on the topic of duplicate content, and for the first time we where not agreeing on how the search engines found and filtered duplicate content.
In a nutshell G-Man’s idea of how to get around the dupe content filter was to add more text in or around the dupe content itself, but not to shuffle the content.
Please take a minute to read G-Man’s full article on his idea of duplicate content, as well as subscribe to his feed as he always has great bit of information on BlackHat, and SEO in general.
Before I can get started on my view of duplicate content filters we must first talk about shingles and types of keywords.
First, the general perception is that search engines use something called shingles. Shingles reference to a block of text that they use to identify sets of words in a contiguous sequence in a document, and has a close relation to how website rank for given keyword(s).
Click here learn more about Google Patent on duplicate content and shingles.
Additionally when looking at words one must break words down into groups. With that there are three types of words.
  • Stop words are general words like get, I, me, the and you.
  • Filler words are great then stop words, but only imply the meaning of the action word and have less value as it does not define what the document is about.
  • Action words on the other hand complete the document and define how or what the document is about. Words like rankings, slipped, page, and penalty could be action words.
The full extent of what words are filler words and what words are action words are unknown to us at this time, however there is strong evidence that Google has selected words that it does count as action words and words it does not count.
If you have ever head of a website called CopyScape.com then you already know that it’s a free tool that helps you check a website’s page for duplicate content on other sites.
Over the years I have played with their tool, and being me I like to figure how things work. With that I have spent much time decoding how to manually get Google to show me duplicate content on other sites, and in some cases I have posted the results in other post like this one.
In the past I found that Google would return in their normal search results 15 action words which would create the shingle out of the 150 char that they showed in the total result per site.
In recent months the number has reduced to 142 char for the total description for each site, and includes around 12-13 action words to complete a shingle.
In either case the size of the current shingle of 12-15 action words will not matter for explaining how Google can hunt down and almost stop duplicate content from ranking.
In theory let’s say Google gives each page 100% good faith on being unique before it starts to index and filter each page on a site.
As Google starts the indexing and filtering process it would apply it’s duplicate content filter by applying the shingles in a step method.
For my duplicate content filter example I will use the following text:
“On some website that some webmaster owns he could have the following content, that he is worried about getting dupe content penalty for. But if he did not copy large quantities of content from other sites then he would not have to worry about getting such a penalty to such an extent that their webpage or website would not rank in Google.”
As Google starts the filtering processing for duplicate content it would break the above text down by first removing all stop words as well as filler words from the document, which would only leave unique action words and create yet a simple and unique finger print.
Since I do not know what words Google fully counts as stop words or filler words I will use all the text in the above quote for my example.
In my above example text I have a total of 62 words. If I assume that Google uses 15 words per shingle then we would be able to produce 48 shingles.
Now that we know the above document has 48 shingles we also know that the each shingle is worth around 2.08% of the document total 100%.
As Google moves along it starts comparing its each shingle to other websites or pages on other sites or the same site. As it finds other shingles that pertain to the matching shingle it will subtract the 2.08% from the webpage’s total value of quality.
As more and more shingles are found to have a match the quality score is reduced more and more till at some point the page’s quality score drops below a threshold that Google has defined.
This same quality score could be applied to the entire domain as Google couple take all pages quality score and produce a score for the entire domain that would pertain to the level of quality a domain has for duplicate content.
This is where G-Man’s and me part ways on our view of the subject.
G-Man’s feels the search engines can not use the above method unless it keeps a record of the location in the document for the shingle. However in my view it is not a necessity to keep locations as all documents should never hold a quality score less then lets say 80%.
Also G-Man feels by adding content in around the duplicate content one would throw the search engines off of giving a penalty.
IMO by adding more content that is unique, one only reduces the percent that each shingle gives to the total quality score and in that event one is unable to reduce the percent for all shingles to a point that it would give them a quality score higher then 80% without making more duplicate content.
Also by scrambling the words* you may avoid getting hit with a duplicate penalty, but then you really don’t have anything to rank with, as your document would not produce a quality search string from the action words.


I agree to disagree on duplicate content! - Read More...
  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads for: I agree to disagree on duplicate content!
Thread Thread Starter Forum Replies Last Post
How to Write Good Web Content Affiliate Blogs Affiliate Marketing 0 11-14-2006 07:12 PM
Does RSS Imply Permission To Reuse Content? Affiliate Marketing News Internet Marketing Articles 0 11-04-2006 06:07 AM
RSS: It's about Content, Content, Content Affiliate Marketing News Commission Junction 0 11-02-2006 04:43 PM
RSS: It's about Content, Content, Content Affiliate Marketing News Commission Junction 0 10-17-2006 07:48 PM
Content vs. Quality bryson Affiliate Marketing Q & A 0 11-17-2004 03:43 PM



© 2004-6 RevenueSource.com.  All rights reserved.  Do not duplicate or redistribute in any form.
This website and its logos/design are property of RevenueSource.com.  All rights reserved. vBSEO 3.2.0 RC7


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34