Personal tools
You are here: Home Members Dustin Scripts dmt
Document Actions

DSPAM Maildir Train (DMT)

by Dustin last modified 2006-01-23 20:16

Can you train your anti spam system using Drag N Drop? With DMT, you can.

For a less technical, higher-level picture of how DMT interoperates with one possible mail system configuration involving DSPAM (and other Open Source software), see the DMT Introduction, complete with pretty screenshots. It's very management friendly!

The Lowdown:

Have you installed DSPAM to get server-wide spam filtering? Is your end user's mail stored in Maildirs centrally on your mail server? Do your end users use IMAP email clients? If yes (to all three of these things), then DMT can make it convenient for your end users to train DSPAM with spam and legitimate messages (AKA "ham") without leaving the comfort of their email clients. All they have to do is drag-n-drop missed spam to an IMAP mail folder like "MissedSpam". Or false positives and older, pre-DSPAM-deployment ham into a folder like "HamTrainNSave" or "HamTrainNDel" (based on wether they want to save the message after training of not). After DMT submits these messages to DSPAM for training, the messages end up saved in folders like "XXOldSpamXX", "XXSaveXX", or "Inbox.Trash". By dragging-n-dropping messages in bulk, this can be a huge productivity booster for larger sites that receive a lot of spam, and can lead to DSPAM being rapidly trained after a new deployment.

Additionally, DMT provides an alternative, more convenient route to spam quarantine management, rather than using a web-based quarantine tool like the one provided with DSPAM. With DMT, a user can view their quarantine (as a mail folder with a name like "SpamQuarantine) from directly within their email client, then retrain DSPAM on false positives using drag-n-drop to folders like "HamTrainNSave" or "HamTrainNDel")

Why seek an alternative to DSPAM's included CGI-based system for the spam quarantine?

  • It's less convenient to have to log in to a separate web application to view and release quarantined spam. Seeing the caught spam filed in a separate mail folder in any IMAP-enabled email client, and moving messages between various mail folders, is much more convenient. See the pretty screenshots here to furtherly envision this convenience.
  • It essentially required the use of Apache's suexec function, which is generally frowned upon with respect to security.
  • It essentially required Apache to take on the group permissions of the "mail" user to access the end user's quarantine mbox. Apache strongly warns not to setuid/setgid to UIDS/GIDS less than 100 (which are traditionally system accounts). The mail user in Debian (and probably most other Unix-like Operating Systems) has a GID of 33.

Note: The CGI environment provided by DSPAM is still useful to see the cool charts of spam statistics! And suexec is not needed for this.

Features:

  • Drag-N-Drop training of DSPAM! Just move emails aound in some pre-defined IMAP mail folders and let DMT do the rest. Say goodbye to the tedious forwarding of spam and false positives to addresses like spam-username@yourdomain.com. Whee!
  • Potentially extremely rapid DSPAM training by your end users, especially just after a new DSPAM deployment. Most of your end users probably don't have a nice mbox file called "ham.corpus" laying around that they can train DSPAM on in bulk just after a new DSPAM deployment. Nor are they going to painstakingly forward one message at a time from their old mail folders with saved spam and ham mail. But your users are very likely to have old POP and IMAP accounts in their email clients, with "Trash" folders, "Save" folders, etc. These older accounts and folders (often locally stored, in a non-mbox, non-Maildir format) have spam and ham messages that DSPAM would love to be trained on. It will be easy (especially in more advanced email clients like Evolution and Thunderbird, which allow simultaneous multiple email accounts using an arbitrary mix of POP and IMAP) for your users to conveniently drag-n-drop loads of old ham into the "HamTrainNSave" mail folder that DSPAM might never get its hands on otherwise. Imagine of all the Shift-clicking and Ctrl-clicking your users can leverage to multi-select and drag-n-drop multiple messages.
  • GPL licensed. See the included "copyright" file for further details.
  • IMAP is supported by many email clients. Therefore this solution is very cross platform for your end users. They can harmoniously use web based email (say, Squirrelmail) one day, and a sophisticated email client like Thunderbird the next, and have the same bulk-action spam training convenience in both places.
  • DMT is very easy to install: it's a bash script (running periodically from each user's crontab on the mail server) that checks if there are any messages in specified Maildirs that DSPAM should be trained on, and acts appropriately.
  • You and your end users can define the names of the six folders that users will use for the training of DSPAM. The config file can be global (/usr/local/etc/dmtrc or /etc/dmtrc) and/or per-user (~/.dmtrc).
  • Optionally, DMT can delete, at the filesystem level, spam or ham messages after training, circumventing archiving, trash folders, etc.
  • Decent error checking, ie. DMT always defaults to doing safe, non-destructive things.
  • It is Squirrelmail friendly (ie. will create INBOX, INBOX.Sent, INBOX.Drafts, INBOX.Sent) in its creation of Maildir folders as needed. These folders are decently benevolent to other IMAP email clients like Evolution, Thunderbird, etc, with some mild email account configuring.

Download:

dmt-2.0.tar.bz2 (Released Apr 22, 2005)  For interoperation with DSpam 3.4.0
dmt-1.0.tar.bz2 (Released Aug 9, 2004)  For interoperation with DSpam 3.0.0

Configuring DMT:

See the included README file, and optionally edit the included config file "dmtrc", which is self-documenting.

FAQ:

See DMT FAQ

Documentation for your End Users:

See DMT End User Documentation, which you may have to make a local copy of and doctor it up to show screenshots relevant to your site, and substitute in any custom Maildir/IMAP folder names as necessary. You have permission to do this.

A sample of how to configure DSPAM for use with DMT:

See Sample DSPAM Configuration

Inter-operation with maildrop, /etc/skel, archivemail:

See DMT Inter-operation

Tested against:

  • Debian GNU/Linux. Other Linux distros should work fine. Solaris and FreeBSD might be slightly tougher, as they don't come with bash by default, and I use a "cp -a" command that is Linux-centric.
  • The Dovecot IMAP server, which likes to work with Maildirs if it sees a user has a Maildir.
  • The behaviour of SquirrelMail, Evolution and Thunderbird when it comes to the IMAP folders worked with for INBOX, Drafts, Sent, Trash, etc. In Evolution or Thunderbird (or any other IMAP email client), one will likely need to edit the email account settings, specifying that the default "Drafts" folder shoud be "INBOX.Drafts", etc. Then you'll have harmony whether you use Squirrelmail or any other IMAP email client. If you'll use other email clients, I recommend testing how they will like the Squirrelmail-friendly INBOX* Maildirs. Yes, I have a Squirrelmail-centric bias, as it provides ultimate email accessibility to me at home and on the road.

Prerequisite programs:

  • maildir2mbox, provided as a standard utility with Qmail. Or use Google to find it precompiled and download it if you're in a hurry.
  • dspam_corpus, which is a standard utility that is provided with DSPAM for bulk/corpus spam heuristic training based on provided spam or ham messages.
  • maildirmake, which is provided by several places, DMT was tested using the one provided by maildrop, but any maildirmake should work as long as it takes the "-f" argument.
  • bash, which is my favorite shell. DMT is a bash script.

Powered by:

This site conforms to the following standards: