Over the past five days, Deckchairs.net has been hit with a massive amount of comment spam, which is mechanically implemented spam placed in the comments sections of a blog. It’s awful and there’s only one good tool to get rid of most of it, Jay Allens’ excellent MT-Blacklist plug-in. I generally use this little program to get rid of spam that has already been identified by others which means that the spammers are on the blacklist.
Somehow, though, this time I am part of the avant-garde and was hit by approximately 1500 spam comments. In order to destroy the unwarranted spam, here’s the regex [sp. correct] or series of words I had to use: rape|sex|incest|videos|collection|rpe|illic it|porn|pics|eager. Alas, if someone now actually wants to post a comment about rape, sex, incest, videos, porn, or they are perhaps “eager,” Deckchairs will knock that comment out. Censorship? Yes. Preservation of sitehood? Yes. Another sign of spam pushing us all to the wall? Yes.
For what it’s worth, the following makes reasonably sure that it’s a domain name you’re blocking
(LIST_OF_WORDS_ABOVE)[\w\-_.]*\.[a-z]{2,}
The only problem I see if that many of those will generate false positives because they can be contained inside of a longer word or words. For instance (hyphenated to avoid MTB): gr-apeape.com. And you’ve just banned the word ea-ger?
You can put word boundaries (\b) around some of those to limit the collateral damage, but it’s big general regexps don’t discriminate.
That’s so EVIL. These spam jerkweeds really have too much time on their hands. *AS IF* I’m going to follow a link to some r@pe/inc4st/murd3r site from a topically unrelated blog somewhere.
But then again, I’m told that there are supposedly a lot of pR0n blogs out there, somewhere – apparently hosted on Blogspot. But those guys wouldn’t use Moveable Type, no?