Difference between revisions of "SoC improve the Spam-X plugin"

From GeeklogWiki
Jump to: navigation, search
m (typos and minor corrections)
(SWOT; Links)
Line 3: Line 3:
 
== Introduction ==
 
== Introduction ==
  
Comment spam doesn't need an introduction - pretty much every site gets it. Geeklog ships with its own spam filter, called Spam-X. This filter can easily be extended by adding modules so that it can either be updated for the spammer's latest tricks or to add support for new anti-spam services. This project is about the latter - creating modules for new services.
+
Comment spam doesn't need an introduction - pretty much every site gets it. Geeklog ships with its own spam filter, called [http://www.geeklog.net/docs/english/spamx.html Spam-X]. This filter can easily be extended by adding modules so that it can either be updated for the spammer's latest tricks or to add support for new anti-spam services. This project is about the latter - creating modules for new services.
  
  
Line 25: Line 25:
 
[http://akismet.com/ Akismet] is associated with the WordPress blog platform. The service initially required a wordpress.com account, which made it not suitable for use in Geeklog (asking our users to sign up with a competitor's website would have looked odd). This requirement has since been dropped: You still need to sign up but can do so now from the Akismet homepage. The service is free (commercial options available).
 
[http://akismet.com/ Akismet] is associated with the WordPress blog platform. The service initially required a wordpress.com account, which made it not suitable for use in Geeklog (asking our users to sign up with a competitor's website would have looked odd). This requirement has since been dropped: You still need to sign up but can do so now from the Akismet homepage. The service is free (commercial options available).
  
There is already an older version of an Akismet module for Spam-X. It will probably need a review to check for API changes.  
+
There is already an older version of an [http://gplugs.cvs.sourceforge.net/gplugs/akismet/ Akismet module] for Spam-X. It will probably need a review to check for API changes.  
  
 
==== Defensio ====
 
==== Defensio ====
Line 45: Line 45:
 
[http://www.stopforumspam.com/ Stop Forum Spam] is a private (one-man?) project. It doesn not require signup and is free.
 
[http://www.stopforumspam.com/ Stop Forum Spam] is a private (one-man?) project. It doesn not require signup and is free.
  
The API has an option to check for a poster's user name, which at first glance doesn't seem like a reliable criterion to detect spam (the API has other options as well). Michael Hampton (author of Bad Behavior) also [http://www.bad-behavior.ioerror.us/2010/02/20/stop-forum-spam/ expressed some doubts] about the service.
+
Their API has an option to check for a poster's user name, which at first glance doesn't seem like a reliable criterion to detect spam (the API has other options as well). Michael Hampton (author of Bad Behavior) also [http://www.bad-behavior.ioerror.us/2010/02/20/stop-forum-spam/ expressed some doubts] about the service.
  
 
=== Discussion ===
 
=== Discussion ===
  
''(more to come)''
+
Since a site would normally only use one (or maybe two) of these services, we would also need an option in the Spam-X plugin to disable modules. Currently, you can simply drop new modules into the Spam-X plugin's directory and they will be picked up automatically.
 +
 
 +
We would like to see a short evaluation of these services as one result of this project. For a proper comparison, the modules should probably be installed in parallel but not be used to actually delete spam. So a sort of evaluation mode could be introduced (and also added to the existing SLV module).
 +
 
  
 
== Part 2: SWOT ==
 
== Part 2: SWOT ==
  
''TBD''
+
The services discussed above work on the assumption that the same sort of spam is going to hit a lot of sites. The bigger a spam wave, the more likely (and faster) it is going to be recognized by one of these services, as they get reports from sites all over the web.
 +
 
 +
Once these services get big (i.e. the more sites they have reporting to them), there is a chance that smaller spam waves may not be recognized. So a spammer that only targets a few sites and with a low volume may get away with it. Of course, the admins of a site hit by this sort of spam will recognize it as spam and remove it. But how could they then alert other site admins?
 +
 
 +
Another use case: At BarCamp Stuttgart 2008, there was a report by participants about a poster who was very active in some loosely connected blogs. He posted comments that were more or less on topic but always included a link to his (unrelated) services. The participants expressed that they would be willing to trust other bloggers who already identified this sort of "borderline spam".
 +
 
 +
[http://swot.fuckingbrit.com/ Spam: Web of Trust] (SWOT) by Michael Jervis provides a framework for this sort of trust relationship in spam reports. The idea is that a website provides an RSS feed of the spam that it identified and blocks. Other site admins who trust the owner of this site can then subscribe to this feed and won't need to take care of the same sort of spam. They can then publish their own feed, and so on, building an entire web of trusted feeds that would allow for quick propagation of information about spammers.
 +
 
 +
We would like to see this concept implemented as a module for Spam-X.
 +
 
 +
* A site owner should be able to subscribe to other SWOT feeds.
 +
* Not all "locally" blocked spam should go into a SWOT feed automatically (e.g. a site may have very strict rules to not allow posts in other languages, but such a feed would not be very useful for other sites).
 +
* It should be possible to publish more than one SWOT feed, e.g. for different levels of filtering or different criteria.
 +
 
  
 
== Level of Difficulty ==
 
== Level of Difficulty ==
Line 64: Line 80:
 
== Further Reading ==
 
== Further Reading ==
  
''TBD''
+
* [[Dealing with Spam]] in Geeklog
 +
* [[Filtering Spam with Spam-X]]
  
  
 
[[Category:Summer of Code]] [[Category:Development]]
 
[[Category:Summer of Code]] [[Category:Development]]

Revision as of 10:53, 6 March 2010

(This is an idea page for the Google Summer of Code)

Introduction

Comment spam doesn't need an introduction - pretty much every site gets it. Geeklog ships with its own spam filter, called Spam-X. This filter can easily be extended by adding modules so that it can either be updated for the spammer's latest tricks or to add support for new anti-spam services. This project is about the latter - creating modules for new services.


Incentive

Geeklog currently ships with a Spam-X module for LinkSleeve (aka SLV). At the time, this was the only free service available that didn't require creating an account, so that it is usable "out of the box".

Over time, more anti-spam services have appeared. One goal of this project is to evaluate these services. And to be able to do that, we would need Spam-X modules to support these services.

The second half of this project is then about creating a new anti-spam service (see details below) and compare it with the existing services.


Part 1: New modules for existing services

Services

Here's a quick rundown of some existing anti-spam services:

Akismet

Akismet is associated with the WordPress blog platform. The service initially required a wordpress.com account, which made it not suitable for use in Geeklog (asking our users to sign up with a competitor's website would have looked odd). This requirement has since been dropped: You still need to sign up but can do so now from the Akismet homepage. The service is free (commercial options available).

There is already an older version of an Akismet module for Spam-X. It will probably need a review to check for API changes.

Defensio

Defensio is a service owned by security firm Websense. The service requires signup and is free "for all personal bloggers" (commercial options available).

Mollom

Mollom is loosely associated with the Drupal CMS. It requires signup and offers both free and for-pay services.

One specialty of Mollom is that it has an "unsure" categorization for posts where it's not quite sure yet whether the post is spam or not. In this case, it displays a CAPTCHA, so that the poster will have to confirm that they are human.

TypePad AntiSpam

TypePad AntiSpam is, as the name implies, associated with the TypePad CMS. It is currently (still) in beta. The service requires signup and is free ("and will always be free" -- quote from the website).

Stop Forum Spam

Stop Forum Spam is a private (one-man?) project. It doesn not require signup and is free.

Their API has an option to check for a poster's user name, which at first glance doesn't seem like a reliable criterion to detect spam (the API has other options as well). Michael Hampton (author of Bad Behavior) also expressed some doubts about the service.

Discussion

Since a site would normally only use one (or maybe two) of these services, we would also need an option in the Spam-X plugin to disable modules. Currently, you can simply drop new modules into the Spam-X plugin's directory and they will be picked up automatically.

We would like to see a short evaluation of these services as one result of this project. For a proper comparison, the modules should probably be installed in parallel but not be used to actually delete spam. So a sort of evaluation mode could be introduced (and also added to the existing SLV module).


Part 2: SWOT

The services discussed above work on the assumption that the same sort of spam is going to hit a lot of sites. The bigger a spam wave, the more likely (and faster) it is going to be recognized by one of these services, as they get reports from sites all over the web.

Once these services get big (i.e. the more sites they have reporting to them), there is a chance that smaller spam waves may not be recognized. So a spammer that only targets a few sites and with a low volume may get away with it. Of course, the admins of a site hit by this sort of spam will recognize it as spam and remove it. But how could they then alert other site admins?

Another use case: At BarCamp Stuttgart 2008, there was a report by participants about a poster who was very active in some loosely connected blogs. He posted comments that were more or less on topic but always included a link to his (unrelated) services. The participants expressed that they would be willing to trust other bloggers who already identified this sort of "borderline spam".

Spam: Web of Trust (SWOT) by Michael Jervis provides a framework for this sort of trust relationship in spam reports. The idea is that a website provides an RSS feed of the spam that it identified and blocks. Other site admins who trust the owner of this site can then subscribe to this feed and won't need to take care of the same sort of spam. They can then publish their own feed, and so on, building an entire web of trusted feeds that would allow for quick propagation of information about spammers.

We would like to see this concept implemented as a module for Spam-X.

  • A site owner should be able to subscribe to other SWOT feeds.
  • Not all "locally" blocked spam should go into a SWOT feed automatically (e.g. a site may have very strict rules to not allow posts in other languages, but such a feed would not be very useful for other sites).
  • It should be possible to publish more than one SWOT feed, e.g. for different levels of filtering or different criteria.


Level of Difficulty

easy to medium

Implementing modules for the existing services should be relatively straightforward. Some more thought and work will be required for the SWOT implementation.


Further Reading