Dealing with Spam

From GeeklogWiki
Revision as of 19:11, 26 March 2006 by Dirk (talk | contribs) (forgot a link ...)

Jump to: navigation, search

Background

When you allow user contributed content of any form (stories, comments, forum posts, ...) on your Geeklog site, you will sooner or later have to deal with spam. That is, unfortunately, the reality of the web of today and it is not going to change any time soon, as long as spamming is cheap, can be done safely through open proxies or from countries, hosting services, and ISP that don't have an anti-spam policy, and, most of all, as long as people keep on buying from spammers.


The Spam-X Plugin

Spam protection in Geeklog is mostly based on the Spam-X Plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.

Spam detection

Geeklog and the Spam-X plugin will check the following for spam:

  • Story submissions
  • Comments
  • Trackbacks and Pingbacks
  • Event submissions
  • Link submissions
  • The text sent with the "Email story to a friend" option
  • A user's profile

Other plugins can also use the Spam-X plugin to filter their content for spam. The Forum plugin does that, for example.

Modules

Geeklog ships with the following Spam-X modules:

MT-Blacklist

MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.

Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.

Geeklog currently still ships with an MT-Blacklist module for Spam-X. It pulls the last version of Jay's MT-Blacklist from a copy held at geeklog.net. However, that list is out of date now and, apart from a few generic rules, doesn't really offer a lot of protection any more. It will most likely be removed from the Geeklog distribution in one of the next releases.

Personal Blacklist

The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.

This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove larger numbers of spam posts from your database.

IP Filter

Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.

Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.

IP of URL Filter

This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.

Experimental Modules

As mentioned above, Spam-X can easily be extended by dropping new modules into the /path/to/geeklog/plugins/spamx/ directory. The following modules are currently under development:

Filters

These modules make use of external services to rate posts as spam (or not spam):

Actions

The Spam-X plugin also lets you define actions to be performed once a post has been recognized as spam. The default actions are to delete the post and, optionally, send an email to the site admin.

The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:


Other plugins

Bad Behavior

Bad Behavior is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.

Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been ported to Geeklog.

Ban Plugin

"The Ban Plugin is designed to do one thing. Provide an easy way to ban people from your web site." (quoted from Tom Willet's site).


Other methods

When you can identify a spammer by their HTTP request, i.e. by the IP address or by certain characteristics in their HTTP request, e.g. an unusual user agent string, the most efficient method of blocking them is directly on the webserver.

On an Apache webserver, that is usually done in the .htaccess file, as explained here. This method is also useful against script kiddies and worms.


Resources

  • The geeklog-spam mailing list is the best place to discuss new anti-spam techniques for Geeklog as well as report spam sightings.
  • Ann Elisabeth Nordbo aka Spam Huntress provides lots of useful information regarding web spam on her site and collects information about known spammers in her wiki.