SpamAssassin Rules Basics
SpamAssassin will use tests to check mail headers, the body, IP Addresses and checksums to locate patterns that indicate SPAM. So SpamAssassin will use pattern-based scores for checking patters that are found in headers, the body or attachments and it will use network-based tests that use DNS lookups or access RBL lists.
If you look in the /usr/share/spamassassin directory you will see a list of the tests that are performed by SpamAssassin.
The tests which are used by SpamAssassin and thus amavisd are located in /usr/share/spamassassin. These consist of over 1000 tests on various parts of the email that arrives. It also includes checks for known spammers. There are thousands of rules that are set up in the /usr/share/spamassassin directory. Each test file contains a number of rules that will be performed. The test files are basically self explanatory but here is some additional information that will help. Ratware are programs that are used by spammers to send their email. These specially designed programs have signatures that will be detected. The 10_misc.cf is a file that defines the templates that are used to report spam. The 20_compensate.cf file creates negative scores for good values in mail that indicate that the mail is not spam. The 50_scores.cf is the file that contains the scores for each rule. 60_whitelist.cf is where common addresses are listed. Here is a list of the directory.
# ls /usr/share/spamassassin/
10_misc.cf 25_accessdb.cf 30_text_nl.cf
20_advance_fee.cf 25_antivirus.cf 30_text_pl.cf
20_anti_ratware.cf 25_body_tests_es.cf 30_text_pt_br.cf
20_body_tests.cf 25_body_tests_pl.cf 50_scores.cf
20_compensate.cf 25_dcc.cf 60_awl.cf
20_dnsbl_tests.cf 25_dkim.cf 60_whitelist.cf
20_drugs.cf 25_domainkeys.cf 60_whitelist_dk.cf
20_fake_helo_tests.cf 25_hashcash.cf 60_whitelist_dkim.cf
20_head_tests.cf 25_pyzor.cf 60_whitelist_spf.cf
20_html_tests.cf 25_razor2.cf 60_whitelist_subject.cf
20_meta_tests.cf 25_replace.cf languages
20_net_tests.cf 25_spf.cf sa-update-pubkey.txt
20_phrases.cf 25_textcat.cf sa-update.cron
20_porn.cf 25_uribl.cf triplets.txt
20_ratware.cf 30_text_de.cf user_prefs.template
20_uri_tests.cf 30_text_fr.cf
23_bayes.cf 30_text_it.cf
Here is an example taken from the 20_head_tests.cf file. Note that some tests require a specific version which is listed at the top. The test is listed in CAPS with underscores followed by the regular expression used to evaluate the rule that is listed. The line underneath provides a description of the rule. The score for each rule is listed in 50_scores.cf.
require_version 3.001007
header HEAD_LONG eval:check_msg_parse_flags(‘truncated_header’)
describe HEAD_LONG Message headers are very long
# partial messages; currently-theoretical attack
# unsurprisingly this hits 0/0 right now.
header FRAGMENTED_MESSAGE Content-Type =~ /\bmessage\/partial/i
describe FRAGMENTED_MESSAGE Partial message
header MISSING_HB_SEP eval:check_msg_parse_flags(‘missing_head_body_separator’)
describe MISSING_HB_SEP Missing blank line between message header and body
header UNPARSEABLE_RELAY eval:check_relays_unparseable()
tflags UNPARSEABLE_RELAY userconf
describe UNPARSEABLE_RELAY Informational: message has unparseable relay lines
Each test looks similar to what you see here. These are header test so they start with the work “header” followed by the name of the test in CAPS. The actual expression of the test is on the right hand side. The first one is a regular expression that shows that there is not real name in the header. The second line is a description of the test. The second test listed shows that the From is a blank line and tests for that with a regular expression.
header NO_REAL_NAME From =~ /^["\s]*\<?\S+\@\S+\>?\s*$/
describe NO_REAL_NAME From: does not include a real name
header FROM_BLANK_NAME From =~ /(?:\s|^)”" <\S+>/i
describe FROM_BLANK_NAME From: contains empty name
Each test has a score that is associated with it in the 50_scores.cf file which is also located in /usr/share/spamassassin. The score adds to the email total score which determines if it is Spam.
score NO_REAL_NAME 0 0.550 0 0.961
The scores have 4 fields. The first is the score added is if a matching message has both the network and Bayesian tests are not in use. In NO_REAL_NAME this is 0. The second score is when network tests are in use and Bayesian tests are not in use. The third score is when Bayesian tests are in use but network tests are not. The final score is when both network tests and Bayesian are in use.
score FROM_BLANK_NAME 1.659 1.467 0.936 1.534
Posted by mike 

2 responses to "SpamAssassin Rules Basics"
12:44 on December 4th, 2008
i was wondering what happens if the scores in /usr/share/spamassassin/50_scores.cf
are modified in say user_prefs (or local.cf) with just a single value ??
5:18 on December 5th, 2008
As you increment the values you will increase the chance of a matching email to be labeled as Spam. So you can increase if you need to be more aggressive or decrease the value if you get false positives.