Dealing with Spam
When you allow user contributed content of any form (stories, comments, forum posts, ...) on your Geeklog site, you will sooner or later have to deal with spam. That is, unfortunately, the reality of the web of today and it is not going to change any time soon, as long as spamming is cheap, can be done safely through open proxies or from countries, hosting services, and ISP that don't have an anti-spam policy, and, most of all, as long as people keep on buying from spammers.
The Spam-X Plugin
Spam protection in Geeklog is mostly based on the Spam-X Plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.
Geeklog and the Spam-X plugin will check the following for spam:
- Story submissions
- Trackbacks and Pingbacks
- Event submissions
- Link submissions
- The text sent with the "Email story to a friend" option
- A user's profile
Other plugins can also use the Spam-X plugin to filter their content for spam. The Forum plugin does that, for example.
Geeklog ships with the following Spam-X modules:
MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.
Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.
The MT-Blacklist modules were removed from Geeklog as of version 1.4.1. Instead, we are now shipping modules for SLV.
The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.
This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove larger numbers of spam posts from your database.
Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.
Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.
IP of URL Filter
This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.
As mentioned above, Spam-X can easily be extended by dropping new modules into the /path/to/geeklog/plugins/spamx/ directory. The following modules are currently under development:
These modules make use of external services to rate posts as spam (or not spam):
The Spam-X plugin also lets you define actions to be performed once a post has been recognized as spam. The default actions are to delete the post and, optionally, send an email to the site admin.
The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:
Bad Behavior is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.
Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been ported to Geeklog.
When you can identify a spammer by their HTTP request, i.e. by the IP address or by certain characteristics in their HTTP request, e.g. an unusual user agent string, the most efficient method of blocking them is directly on the webserver.