SoC improve the Spam-X plugin

From GeeklogWiki
Revision as of 09:05, 29 March 2013 by Dirk (talk | contribs) (language tweaks and clarifications; added a link to the CAPTCHA plugin announcement)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
(This is an idea page for the Google Summer of Code)

Introduction

Comment spam doesn't need an introduction - pretty much every site gets it. Geeklog ships with its own spam filter, called Spam-X. This filter can easily be extended by adding modules so that it can either be updated for the spammer's latest tricks or to add support for new anti-spam services.


Incentive

Spam-X works very well in practice. There is always room for improvement of course and that is what this project is about. We are looking for ways to make using the spam filter more efficient (for the site admin) and also try to extend the filtering capabilities.

From a usage point of view, the handling of long lists of blacklisted phrases and IP addresses could be improved. Also, there are new anti-spam services that we would want to try out.


Part 1: Usability improvements

In addition to using external services (see below) that rate posts as spam or not spam, site owners can use their own blacklists to fine-tune filtering of spam specifically for their own site. As a result, however, you'll often end up with long lists of blacklist entries. There are two obivous problems with this:

  1. you don't know whether or not such a rule is still valid, i.e. used to filter spam
  2. the long lists are hard to use and maintain

We have (raw and unfinished) patches for both of these issues (#1076 and #1077). So the minimal goal for this part would be to finish and implement these changes.

However, we would also welcome new ideas on how to better handle these issues. Surely, having a sortable list of several hundred entries is not the only possible solution to this? Here's a chance for a student (i.e. you) to come up with a clever idea that sets your proposal apart from the others.

Other UI improvements that we're looking for (and that should be easy to implement) in this part of the project would be to ensure consistency with the "look and feel" of the rest of Geeklog. Currently, the Spam-X plugin is sticking out a bit, both from the way the admin panels work as well as from how they look. This would make a good first task in the project.


Part 2: A new spam filter module and API changes

Geeklog currently ships with a Spam-X module for LinkSleeve (aka SLV). At the time, this was the only free service available that didn't require creating an account, so that it is usable "out of the box".

Over time, more anti-spam services have appeared (see list). One of the most interesting - and free - anti-spam services these days is Mollom, which is associated with the Drupal community (but not limited to Drupal sites).

One important difference between Mollom and other services is that it can return an "unsure" ranking for a comment post. This means that the post may or not be spam - Mollom isn't sure. So what do we do? Display a CAPTCHA to the poster.

This concept, however, can not easily be integrated into Geeklog right now. First of all, Spam-X currently expects either a "thumbs up" or "thumbs down" answer, after which the comment post is either allowed or dismissed. "I don't know" simply isn't supported. So to support this third possible reply, some changes will have to be made to Spam-X itself and to the Geeklog code calling it.

There's also the problem that Geeklog's CAPTCHA plugin would always display a CAPTCHA to the user. So if Mollom were to return an "unsure" response, the poster would have to solve two CAPTCHAs, which would be very annoying. We need a solution for this scenario, e.g. some communication with the CAPTCHA plugin.

And finally, since we're going to have to change the Spam-X API anyway, this would be a good opportunity to address a design flaw of the API: Once a comment post is considered spam, it is deleted. There is currently no way to store the post for later review (and possible approval). Fixing this is not as simple as it may seem, though, since currently Spam-X simply doesn't know where the post came from - it could be a comment or a story submission, both of which would have to be treated differently.

Backward compatibility has to be considered. There is third-party code out there that uses the current Spam-X API that we don't want to break. An easy, though not the only, way out would be to introduce new API functions.


Level of Difficulty

low to medium

The usability changes and implementation of the Mollom module itself should be relatively straightforward (there already is a PHP class for Mollom). Changing the Spam-X API will be more demanding, especially since backward compatibility will be an issue.

Possible mentors: Tom Homer, Dirk Haun


Further Reading