Tuesday, September 29, 2009

Spam detection on websites?

Assume you have a user content site - or you're using software that can somehow get spam links inserted into it.

How do you find out if your website has spam put on it?

It seems a common enough problem these days... people putting spam links on websites. Surely there must be a service or piece of software to detect such a thing?

I can think of a few ways to go about writing one fairly easily (using existing spam detection tools... but applying them to a spiders crawl of your website). It would be much nicer if there's already a tool which does such a thing though.

3 comments:

dmarti said...

I'm working on a system for sharing good/bad URLs. Try it: aloodo.com . No reputation system can be 100%, but it's useful for filtering the referer log and moderating comments.

zgoda said...

LinkSleeve and Akismet are common tools. The former seems more generic, the later is geared towards blog comments mostly (but may be used with any content). The LinkSleeve has XML-RPC api, Akismet uses plain old http. Both are free for non-commercial use.

chrisarndt.de said...

I run a SpamBayes server and have a FormEncode validator that passes submissions to this server. The source code is here: http://trac.chrisarndt.de/eggbasket/browser/SpammCan/trunk/spammcan/forms/validators.py#L131

It only needs a little training and then usually works very well.

Chris