Dealing with Spam

From GeeklogWiki
Revision as of 17:59, 26 March 2006 by Dirk (talk | contribs) (incomplete; and some of the content should probably be moved to the Spam-X Plugin's page ...)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Background

When you allow user contributed content of any form (stories, comments, forum posts, ...) on your Geeklog site, you will sooner or later have to deal with spam. That is, unfortunately, the reality of the web of today and it is not going to change any time soon, as long as spamming is cheap, can be done safely through open proxies or from countries, hosting services, and ISP that don't have an anti-spam policy, and, most of all, as long as people keep on buying from spammers.


The Spam-X Plugin

Spam protection in Geeklog is mostly based on the Spam-X Plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.

Spam detection

Geeklog and the Spam-X plugin will check the following for spam:

  • Story submissions
  • Comments
  • Trackbacks and Pingbacks
  • Event submissions
  • Link submissions
  • The text sent with the "Email story to a friend" option
  • A user's profile

Other plugins can also use the Spam-X plugin to filter their content for spam. The Forum plugin does that, for example.

Modules

Geeklog ships with the following Spam-X modules:

MT-Blacklist

MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.

Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.

Geeklog currently still ships with an MT-Blacklist module for Spam-X. It pulls the last version of Jay's MT-Blacklist from a copy held at geeklog.net. However, that list is out of date now and, apart from a few generic rules, doesn't really offer a lot of protection any more. It will most likely be removed from the Geeklog distribution in one of the next releases.

Personal Blacklist

The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.

This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove larger numbers of spam posts from your database.

IP Filter

Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.

Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.

IP of URL Filter

This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.