Difference between revisions of "Dealing with Spam"

From GeeklogWiki
Jump to: navigation, search
(MT-Blacklist modules removed from Geeklog 1.4.1)
m (fixed link)
 
(6 intermediate revisions by the same user not shown)
Line 31: Line 31:
 
Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually [http://www.geeklog.net/article.php/mt-blacklist-discontinued discontinued MT-Blacklist] on the assumption that new and better methods to detect spam are now available.
 
Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually [http://www.geeklog.net/article.php/mt-blacklist-discontinued discontinued MT-Blacklist] on the assumption that new and better methods to detect spam are now available.
  
The MT-Blacklist modules were removed from Geeklog as of version 1.4.1. Instead, we are now shipping modules for [http://www.geeklog.net/docs/spamx.html#slv SLV].
+
The MT-Blacklist modules were removed from Geeklog as of version 1.4.1. Instead, we are now shipping modules for [http://www.geeklog.net/docs/english/spamx.html#slv SLV].
  
 
==== Personal Blacklist ====
 
==== Personal Blacklist ====
Line 57: Line 57:
 
These modules make use of external services to rate posts as spam (or not spam):
 
These modules make use of external services to rate posts as spam (or not spam):
  
* [http://lists.geeklog.net/pipermail/geeklog-spam/2005-December/000049.html Akismet modules]
+
* [http://gplugs.cvs.sourceforge.net/gplugs/akismet/ Akismet modules]
* [http://www.geeklog.net/article.php/slv-for-spam-x LinkSleeve / SLV module]
+
* The '''spam-merge wiki spam list''' is a shared blacklist of serveral Wiki communities and sort of a successor to the discontinued [[Dealing_with_Spam#MT-Blacklist|MT-Blacklist]]. For more information and how to use it with Geeklog see [http://eight.pairlist.net/pipermail/geeklog-spam/2007-May/000009.html here] and [http://eight.pairlist.net/pipermail/geeklog-spam/2007-June/000010.html here].
  
 
==== Actions ====
 
==== Actions ====
Line 66: Line 66:
 
The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:
 
The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:
  
* [http://lists.geeklog.net/pipermail/geeklog-spam/2005-September/000039.html Bad Behavior module]
+
* [http://lists.geeklog.net/pipermail/geeklog-spam/2005-September/000039.html Bad Behavior module] (Note: For use with Bad Behavior 1 only - won't work with Bad Behavior 2)
 
* [http://lists.geeklog.net/pipermail/geeklog-spam/2005-October/000046.html Ban plugin module]
 
* [http://lists.geeklog.net/pipermail/geeklog-spam/2005-October/000046.html Ban plugin module]
  
Line 74: Line 74:
 
=== Bad Behavior ===
 
=== Bad Behavior ===
  
[http://www.ioerror.us/software/bad-behavior/ Bad Behavior] is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.
+
[http://www.bad-behavior.ioerror.us/ Bad Behavior] is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.
  
Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been [http://www.geeklog.net/forum/viewtopic.php?showtopic=59396 ported to Geeklog].
+
Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been [http://www.geeklog.net/article.php/bad-behavior2 ported to Geeklog].
  
 
=== Ban Plugin ===
 
=== Ban Plugin ===
Line 92: Line 92:
 
== Resources ==
 
== Resources ==
  
 +
* [[Filtering Spam with Spam-X|How to use Spam-X in your own plugin]] (for plugin developers)
 
* The [http://lists.geeklog.net/mailman/listinfo/geeklog-spam geeklog-spam mailing list] is the best place to discuss new anti-spam techniques for Geeklog as well as report spam sightings.
 
* The [http://lists.geeklog.net/mailman/listinfo/geeklog-spam geeklog-spam mailing list] is the best place to discuss new anti-spam techniques for Geeklog as well as report spam sightings.
 
* Ann Elisabeth Nordbo aka [http://spamhuntress.com/ Spam Huntress] provides lots of useful information regarding web spam on her site and collects information about known spammers in her [http://spamhuntress.com/wiki/Main_Page wiki].
 
* Ann Elisabeth Nordbo aka [http://spamhuntress.com/ Spam Huntress] provides lots of useful information regarding web spam on her site and collects information about known spammers in her [http://spamhuntress.com/wiki/Main_Page wiki].
 +
* [[Other anti-spam services]] that could possibly be integrated into Spam-X

Latest revision as of 11:45, 20 March 2013

Background

When you allow user contributed content of any form (stories, comments, forum posts, ...) on your Geeklog site, you will sooner or later have to deal with spam. That is, unfortunately, the reality of the web of today and it is not going to change any time soon, as long as spamming is cheap, can be done safely through open proxies or from countries, hosting services, and ISP that don't have an anti-spam policy, and, most of all, as long as people keep on buying from spammers.


The Spam-X Plugin

Spam protection in Geeklog is mostly based on the Spam-X Plugin, originally developed by Tom Willet. It has a modular architecture that allows it to be extended with new modules to fight the spammer's latest tricks, should the need arise.

Spam detection

Geeklog and the Spam-X plugin will check the following for spam:

  • Story submissions
  • Comments
  • Trackbacks and Pingbacks
  • Event submissions
  • Link submissions
  • The text sent with the "Email story to a friend" option
  • A user's profile

Other plugins can also use the Spam-X plugin to filter their content for spam. The Forum plugin does that, for example.

Modules

Geeklog ships with the following Spam-X modules:

MT-Blacklist

MT-Blacklist was a blacklist, i.e. a listing of URLs that were used in spam posts, originally developed for Movable Type (hence the name) and maintained by Jay Allen.

Maintaining a blacklist is a lot of work, and you're continually playing catch-up with the spammers. Therefore, Jay Allen eventually discontinued MT-Blacklist on the assumption that new and better methods to detect spam are now available.

The MT-Blacklist modules were removed from Geeklog as of version 1.4.1. Instead, we are now shipping modules for SLV.

Personal Blacklist

The Personal Blacklist module lets you add keywords and URLs that typically exist in spam posts. When you're being hit by spam, make sure to add the URLs of those spam posts to your Personal Blacklist so that they can be filtered out automatically, should the spammer try to post them again.

This will also help you get rid of spam that made it through, as you can then use the Mass Delete Comments and Mass Delete Trackbacks modules to easily remove larger numbers of spam posts from your database.

IP Filter

Sometimes you will encounter spam that is coming from one or only a few IP addresses. By simply adding those IP addresses to the IP Filter module, any posts from these IPs will be blocked automatically.

Please note that IP addresses aren't really a good filter criterion. While some ISPs and hosting services are known to host spammers, it won't help much to block an IP address by one of the well-known ISPs. Often, the spammer will get a new IP address the next time he connects to the internet, while the blocked IP address will be reused and may be used by some innocent user.

IP of URL Filter

This module is only useful in a few special cases: Here you enter the IP address of a webserver that is used to host domains for which you may see spam. Some spammers have a lot of their sites on only a few webservers, so instead of adding lots of domains to your blacklist, you only add the IP addresses of those webservers. The Spam-X module will then check all the URLs in a post to see if any of these is hosted on one of those blacklisted webservers.

Experimental Modules

As mentioned above, Spam-X can easily be extended by dropping new modules into the /path/to/geeklog/plugins/spamx/ directory. The following modules are currently under development:

Filters

These modules make use of external services to rate posts as spam (or not spam):

  • Akismet modules
  • The spam-merge wiki spam list is a shared blacklist of serveral Wiki communities and sort of a successor to the discontinued MT-Blacklist. For more information and how to use it with Geeklog see here and here.

Actions

The Spam-X plugin also lets you define actions to be performed once a post has been recognized as spam. The default actions are to delete the post and, optionally, send an email to the site admin.

The following modules also pass the information from a spam post on to other plugins (see below) to block any further spam posts from the same source automatically:


Other plugins

Bad Behavior

Bad Behavior is a collection of scripts checking for broken HTTP requests as well as signatures of known spambots and aims at stopping those before they even get the chance to post spam.

Bad Behavior was written by Michael Hampton as a plugin for WordPress but has since been ported to Geeklog.

Ban Plugin

"The Ban Plugin is designed to do one thing. Provide an easy way to ban people from your web site." (quoted from Tom Willet's site).


Other methods

When you can identify a spammer by their HTTP request, i.e. by the IP address or by certain characteristics in their HTTP request, e.g. an unusual user agent string, the most efficient method of blocking them is directly on the webserver.

On an Apache webserver, that is usually done in the .htaccess file, as explained here. This method is also useful against script kiddies and worms.


Resources