SoC spam-x overhaul

From GeeklogWiki
Revision as of 09:08, 7 February 2009 by Dirk (talk | contribs) (GSoC idea: Spam-X overhaul (work in progress))

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
(This is an idea page for the Google Summer of Code)

Introduction

Spam-X is Geeklog's spam filter plugin. The concept is simple but has proved itself to be very effective:

Every post on a Geeklog site is run through Spam-X which then gives a "thumbs-up" (not spam) or "thumbs-down" (spam detected) result back to the caller. The plugin itself can easily be extended by dropping in new modules, in case the spammers are trying new tricks or new methods of spam detection become available.


Incentive

The Spam-X plugin does have some issues, though:

  • No second chance for false positives: A post that is flagged as spam is deleted immediately. So if the plugin happens to flag a valid post as spam, the post is lost. This can cause some frustration for users trying to submit a valid post.
  • Filter list management: When following a strict filter policy, the lists per filter module tend to become long and hard to manage. The plugin also currently does not keep track of whether a filter rule is used or how often it applies to posts.


Details

The main concept, as outlined above, is solid and should be kept:

  • extensible - easy to add new filter modules
  • simple "thumbs-up" / "thumbs-down" decision - but the consequences of a thumbs-down shoule be reconsidered

A spam filter should be easy to use and maintain or it won't be used.

Here are some ideas about what could be made to improve both the main functionality (i.e. catching spam posts) and the maintenance:

Use Counter

It would probably make sense to add a simple use count and a last-used timestamp to every filter rule or filter module (in case the module does not store rules in the database) to keep track of which rules / modules are actually effective.

This should be accompanied by some overview to easily find effective and ineffective rules and modules.

Moderation Option

As explained above, posts that get the "thumbs-down" are deleted immediately. There could be an option to keep flagged posts in a submission queue so that they can be approved by a moderator.

There are some problems with this idea, though:

  • due to the amount of spam, this could lead to a very long moderation queue
  • Spam-X does not currently know what type the post is. In other words, Spam-X only sees the content of the post (and meta information like the HTTP headers) but it does not know whether it is a story, a comment, or some sort of post for a plugin (e.g. a forum post). This information would be needed, though, to push the post into the proper moderation queue.


List Views

This is more of a technical detail: Geeklog provides list views that are sortable and can be searched. These are not currently used by the Spam-X plugin but would help make long lists of filter rules more maintainable.


Level of Difficulty

medium to high