There's a new HTML attribute in town called "nofollow" that's supposed to prevent weblog spam. In this article you’ll learn about weblog spam, what the nofollow tag is, and how might or might not help the situation.
If you have email, you get unsolicited email messages called spam. What you might not have realized, however, is that Web sites get spammed too, with bogus articles, comments or discussion board entries that are intended to add links to another Web site rather than further the discussion.
In the world of Weblogs (or “blogs��? for short), it’s reaching epidemic proportion, and fighting blog spammers has become a serious effort for the online community. For some bloggers, it’s a matter of survival, as they drown in the dozens or even hundreds of bogus comments added by software applications on a nightly basis.
The problem is that you don’t want to block everyone from posting their comments, because it’s the dialog that ensues from an interesting weblog posting that makes blogs such compelling reading. On my weblogs, I have comments added to articles that are months or even years old and other articles have garnered 50 or more comments.
It’s the same dilemma as with email, isn’t it? You want to filter out all the spam without accidentally blocking a legitimate message.
Rather than try to create blacklists, blockers, filters or other mechanisms in weblog software, however, in January a group of bloggers proposed that the value of links from blogs to third-party sites be removed instead. Critically, this required the participation of the major search engines, and as of this point Google, Yahoo and MSN have all signed on and now support nofollow.
Why Do Spammers Want Links from Blogs?
If you’re not a search engine maven, you might not be aware that one of the key criterion used when a search engine like Google decides which match to list as #1 versus #434 for a given search is how many sites point to your site. If a site shows up higher in the search results because it has more inbound links from other sites, it should be no surprise whatsoever that a major goal for a Web site owner is therefore to create the maximum number of inbound links possible.
In the world of Google, the importance of your site is referred to as its PageRank, though it’s actually more complex than that: PageRank actually refers to the popularity of your site, while search results show the pages on various sites that match a given search query.
Legitimate sites accomplish high PageRank by having great content, compelling writing, a witty or unusual perspective, or lots of friends. That’s the promise of the vox populari foundation of the World Wide Web and modern search engine results. But if you’re building a porn site, a gambling site, or something else that isn’t likely to inspire people to link to you, a tool that automatically adds links to your site from other sites by injecting bogus comments is going to be very interesting.
Like distributors of other unpopular materials, these spammers shrug when asked how they could do something so antisocial, and genuinely couldn’t care less about the weblogs that they’re polluting. It’s all about them and their inbound links, and everything else is secondary to that magical #1 spot on Google.
The Solution Strategy
In a nutshell, the problem is that if you enable the links in comments added by others to your blog, or even just link their name to their site, you're ostensibly inviting unscrupulous spammers to add their own garbage posts purely to gain a link from your site to their own.
To see how simple the nofollow solution is, let's peek at some HTML. A hypertext link in HTML looks like this:
<a href=" some URL ">text that's linked</a>
The change that the three big search engines have implemented is the support for a new attribute called rel with a specific value of nofollow. The previous link would be blindly followed by a search engine crawler (for example, Googlebot), and the linked site would gain some PageRank from the source site. With this new attribute, however, links are not followed by the spiders and PageRank is not transmitted. Here's how that link would look in this brave new world:
<a href=" some URL " rel="nofollow" >text that's linked</a>
Six Apart, the company that makes Movable Type, has announced that they've already updated TypePad, and that LiveJournal is going to implement it for comments from people who aren't friends. Other Weblog and related systems that have announced support include Blogger, WordPress, Flickr, Buzznet, Scripting News, blojsom and Blosxom.
For Movable Type users, there's a nofollow plugin that you'll need to download and install. Fortunately, it's only 4K total (a tiny Perl script, actually) and it's worth doing right now, while you're thinking about it.
The Geeky Guts of It All
For the really geeky, the new plugin changes the behavior of links in the tags <MTPings>, <$MTCommentAuthorLink$>, and <$MTCommentBody$> and, far cooler, also lets you enable the automatic tagging of "nofollow" to any URLs encountered in any other MT tag by adding the attribute nofollowfy="1". Are you tired of the links in your articles themselves giving other sites PageRank? Then you could use <$MTEntryBody nofollowfy="1"$> and it'll never happen again (though I don't know why you'd do this, honestly!)
I've installed the plugin on my Weblogs, so if you're so inclined, pop over to my Ask Dave Taylor site, find one of my articles where there are comments and use "View Source" in your browser to confirm that I now do indeed have the snazzy new rel="nofollow" attribute in comment links off my site.
Will It Work?
And, finally, is it going to work? I dunno.
There are some definite problems with this strategy, not the least of which is that it means that if my friends and colleagues pop by and post an erudite comment - or write their own article that trackbacks to mine - I would like to give them some of my PageRank goodness, but now I can't. You're all thrown into the 'spammer scum' box, like it or not.
Also, having it as an add-on is like Microsoft solving security problems with a system patch: it only works if every single person installs it and three months after nofollow was announced, there are still a statistically significant percentage of MT, WordPress and other self-hosted Weblogs that do not support "nofollow". Unfortunately, it really is an all or nothing situation, too, because, as we've learned in the last few years, spammers are happy to send out a million messages for a handful of positive responses or, in this case, add bogus comments to thousands of weblogs for one or two non-nofollow link that the search engine spiders will find.
Nonetheless it's clear that something has to be done about blogspam, and I applaud the search engines and weblog teams for working together to at least make some progress in this direction, however suboptimal it may be.
This article originally appeared on the InformIT site, where I write a series of monthly columns on blogs and blogging topics.
Search Engine News