Posts Tagged Bayesian Filtering

Postfix: Bayesian Learning System

Posted by Filed Under Spam Control with Comments Off

Learning System
You are able to additionally tune SpamAssassin to learn about your email.  Two programs are used together to create this learning system; autowhitelisting and Bayesian filtering.  Autowhitelisting is an algorithm that learns about each senders history and modifies the spam score of their subsequent mail.  This should reduce false positives.  Autowhitelisting develops a database for each sender’s mail address and IP address.  Each time a message is received from that sender the score is added to the database score for that sender.  The average score divided by the number of messages is used to modify any new messages.

The most important issue with autowhitelisting is the weight you place on the sender history.  The auto_whitelist_factor is the directive that sets the multiplier between 0-1.  The default is .5 which will make the final score halfway between the message spam score.  If you wanted to increase the weight set the factor to 1.

The system-wide autowhitelist with amavisd.
Edit the /etc/mail/spamassassin/


Sitewide Bayesian Filtering for Amavisd
The idea behind Bayesian filtering is that it will learn aspects of email which will determine how to distinguish between spam and non-spam.  The advantage is that it can help facilitate a more accurate Spam filtering process.  The Bayesian rules sets up baselines that determine how much each rule should change the possibility that the email is Spam.  These rules have features that are likely to be Spam, thus increasing the probability, and they have rules that typically are not in Spam, thus reducing the probablity of Spam.
Edit the /etc/mail/spamassassin/

use_bayes 1
bayes_path /var/amavisd/bayes/bayes

Create the directories you need in /var both amavisd and the subdirectory bayes.  Be sure to chmod 700 the database file so no others can access it.  The user is vscan as is set up in the /etc/amavisd.conf file so that user must have access to the file.  Now with the new version of Spamassassin the line for bayes_pay must not end in a folder, so add the name bayes to it per the example.

chown -R vscan:vscan /var/amavisd/

ls -la /var/amavisd/bayes/
total 8
drwx—— 2 vscan vscan 4096 May 11 07:32 .
drwx—— 3 vscan vscan 4096 May 11 07:32 ..