I’ve been following a case in Denmark where a cloaking company has been making a couple interesting claims. First, they claim that if a “brand-name” company cloaks, Google won’t remove the brand-name domain. That’s simply not true; if we believe that a company is abusing Google’s index by cloaking, we certainly do reserve the right to remove that company’s domains from our index. Next, the cloaking company claims that their method of cloaking is undetectable. I’ve written about
“undetectable webspam” before. In that case, the “undetectable spam” could be found with a single Google query.
So let’s go back to this Danish company’s assertion that its cloaking is “undetectable.” Here’s an example claim on the English version of their page:

The claim is that “search engines cannot find out who is behind cloaking.” The Danish version of this page is slightly different:

One colleague at Google translated the final sentence from Danish as “However, as you can read below, they don’t stand a chance at figuring out who’s behind the solution, and thus cannot punish anyone for it.”
My colleague Brian White checked this claim out and very quickly found this hilarious page:

Here’s another error page:

That’s right, someone hasn’t configured their “undetectable” cloaking script correctly. The errors that the script is spewing out give absolute file paths and much more info. Digging into the details mentioned in the error messages quickly leads you to more domains. So much for that cloaking being undetectable. By the way, this cloaking script has been producing highly noticeable errors like this for almost
two months.
So here’s a few takeaways:
- If you’re going to claim that your webspam is “undetectable” then try to avoid spewing error messages that give lots of information about your domains.
- Also, you might want to avoid internal names like “CLOAKING_LINK_BUILDING” or “CLOAKING_RSS_Reader.php”. It tends to be a bit of a giveaway, and you never know when those names will get
accidentally exposed.
More generally, if someone is trying to manipulate Google by deceptive cloaking, it means that a webserver is returning different content to Googlebot than to users. That’s a condition that can be checked for by algorithms or manually, and such cloaking is certainly not “undetectable.” For cloaking to be completely “undetectable,” it would have to be like that Steven Wright joke: “Last night somebody broke into my apartment and replaced everything with exact duplicates.” And a cloaking script that gave users and Googlebot exactly duplicate pages would be a bit pointless.
Detecting more “undetectable” webspam - Read More...