Osama Bin Laden and Other Bloggers Conjured up by Google AI

Osama bin Laden
Osama bin Laden — a popular blogger?

My friend — who?!

Nothing that complies with the laws of nature -- known or unknown -- is impossible. So I cannot say that the probability of Tony Judge turning into Osama bin Laden is exactly zero. Particularly if Google indicates that's the case!

But it is highly improbable. Believe me. Tony may be interested, as many of us are, in the enigmatic and elusive billionaire that enjoys such a high visibility on FBI's list of most wanted terrorists, but Tony is not Osama. I know him. Tony is one of my best friends. It's not his style, anyway. Honest to Hawking. Okay?!

Really, it's even less probable than a theory that this is Osama bin Laden's devious stunt aiming to deprive Tony, and hordes of other unsuspecting bloggers across the Internet, of their authorship attribution!

Phantom impersonator

So then, why is it that some of Tony's articles, which he publishes on his website laetusinpraesens.org, are signed "by O bin Laden"?

Search results tend to shift with time but, still at the time of reviewing this article, querying Google for

site:laetusinpraesens.org osama

brings up a few links, apparently authored "by O bin Laden", right on the first page.

Here's a screenshot saved on Saturday, January 29, 2011:
 

Google search results
Google search results for keyword "osama" for site laetusinpraesens.org

See the 2nd result from the top. Note the author's name, in grey colour, underneath the link.

Marketable tales...

HTML standards ignored

This situation would be funny if it weren't so sad. The HTML language provides a perfectly good <meta> tag to define the author of any web page. However, Google has either decided to ignore it, or uses it in a non-transparent manner. Not nice, but hey: in the real world, a mammoth enterprise like that can afford to make their own rules.

The problem is that Google does not provide any coherent alternative guidelines for defining authorship. Mentioning the page's author in other meta tags, as suggested by some (e.g. here), may help -- but it may confuse Google's algorithms even more. For what if there are more authors? What if the person's name is such that it modifies the meaning of a sentence? Will the bot understand and extract the right bits from the various longer texts? I seriously doubt it.

For authors, not knowing the rules or logic of Google's algorithms, the results seem arbitrary and unpredictable. It may even be that their tamper-proof logic of how to extract authorship information is so secret that even Google cannot google it back. The results we see appear to support the possibility that the search engine is failing spectacularly at properly recognizing authors of web pages it indexes.

Mentally challenged AI

It seems rather clear that the AI bot responsible for sorting out the semantics has pretty much lost it. Search results show author names that seem to be a product of a feverish artificial mind. Instead of infantile dreaming of electric sheep, the Google robot appears to amuse itself by playing cruel jokes on web authors.

For example, the source code of the above article correctly indicates that it has been authored by "Anthony Judge": <META NAME="Author" CONTENT="Anthony Judge">. Yet Google somehow detects the authorship as "by O bin Laden":

OBL

Prolific CF Reactor

O bin Laden is not the only apparent plagiarizer of Tony's articles. By far the more successful is a certain CF Reactor.

Here's an example:

OBL

Again, the page's source code documents that the correct meta tag has been used:

OBL

Encountering that particular name all over Google, Tony grew curious, and he was trying to find out who was "the guy" that was apparently taking the credit for his articles. He could not find anybody of that name, so he decided that it had to be some sort of disruptive information aggregator ("reactor") somewhere on the web. But he could not find any.

Then one day he asked me for an opinion and together we had a closer look at this example, and its source code. Luck had it that we noticed that somewhere deep in the menu of the page there was a link to another article of Tony's, entitled "Cognitive Fusion Reactor".

OBL

So, apparently, the bot somehow came to the conclusion that this was the author of the page and formatted the name to the neat format: "CF Reactor"!

That clue led us to the discovery that the "Author" meta tag in the original page, on another domain (un-iter8.org), was set to "ITER-8: Cognitive Fusion Reactor", like this: <META NAME="Author" CONTENT="ITER-8: Cognitive Fusion Reactor">.

OBL

All right, so that particular little mystery cleared up. It simply was a mistake of Tony's. People, like bots, are not infallible!

Robot on a rampage

But the above does not provide an explanation for how "O bin Laden" became the author of e.g. the first article above because, as we have seen there, its "Author" meta tag is correct.

Also, it is strange that Google, which normally does not pay attention to the "Author" tag (indeed, it appears to skip it) did pick up on it in the case of un-iter8.org.

And the really maddening thing is that instead of keeping this finding isolated to the one page, the Google bot went ballistic and tagged at least five hundred other articles as authored "by CF Reactor". All of them on Tony's domain laetusinpraesens.org (even though the tag was from un-iter8.org). And completely ignoring the fact that all those articles have had their "Author" meta tags correctly set to "Anthony Judge".

Infested Google Scholar

Google Scholar, the search engine specialized in "indexing scholarly literature", also appears to believe Osama bin Laden is the true author of some Tony's articles:

OBL

Rather amusingly, in Google Scholar there is even a PDF document, called Open Letter from The Project for the New American Century to US President George W. Bush, that is labelled by Google as authored by O bin Laden:

OBL

More plagiarizers

Apart from O bin Laden and CF Reactor, there are many other curious "authors" of Tony's articles, although less frequent, including WN Renaissance, G Ass, T Steps, A Disagreement, A Type, ghosts of real people such as A Einstein and WB Rayward, F Reactor (surely family of CF?), M Lanthanides -- maybe a chemist, a geeky someone called IP Value, scary-sounding UR Altersschwäche and IG a Terrorist, culminating in somebody inconceivably called A an institutional Apocalypse!

OBL

OBL

OBL

OBL

OBL

OBL

OBL

OBL

OBL

OBL

OBL

OBL