Friday, 9 January 2009

How, House keeping process to remove blacklisted items from DB?

Theres now a blacklisting feature in place at the point where authors / spammers submit their pad files to softtester, behind the authors page.

This stops quite a bit of spam.

But I've already got spam and things could be determined as spam at a later date.

So I need a process which will look at the softtester MySQL database and deal with them.

However, there are various problems.
1. The black list is a list of domains, the fields in the database are full urls, so I'd have to do a like % ... % query rather than an equals, which will be slow.
2. These fields don't have indexes.
3. Its going to take a long time to scan the database / fields for each of these domains on a regular basis.

Any solution has to automated and require no involvement from me, when its up and running.

Solutions.
1. Create an offline solution which works on a copy of the database.
Pros.
Quick and won't affect the web server
Cons.
To be fully automated it would need to grab a copy of the database, import it to a test database then perform the actual process, then action the live database.

2. Do the process manually a couple of times a year.
Pros.
Erm, not sure.
Cons.
Lots of effort, time and lots of transactions on the live database.

Thoughts?

by JM

4 comments:

  1. I must admit that although I blacklist NEW submissions I don't go back and delete all existing listings from the same URL.

    The reason for this is that sometimes I blacklist sites submitting screensavers if I think they are taking the mickey. There was one site that started submitting Bald Eage ScrSvr 1, Bald Eagle ScrSvr 2. By the time he'd got to number 40 I had enough and blacklisted him. However, I'm still listing all his other screensavers.

    I could go back and do things manually but I'd probably update my blacklist management page to add a "Delete All From This URL" type of link/ command. I'd then spend a few minutes a day going through the list.

    ReplyDelete
  2. But there are some which will have slipped through the net.

    ReplyDelete
  3. Agreed. But you can't win them all and at least the ratio
    of spam to real software listings should reduce in the
    future.

    ReplyDelete
  4. Well...

    For me, there will be a gap for the period between when I receive and accept (automatically) and when you update the black list.

    There will be some getting through there too.

    I do need something to check afterwards too.

    ReplyDelete