NAME
scanmail, testscan – spam filters |
SYNOPSIS
upas/scanmail [ options ] [ qer–args ] root mail sender system
rcpt–list
upas/testscan [ –avd ] [ –p patfile ] [ filename ] |
DESCRIPTION
Scanmail accepts a mail message supplied on standard input, applies
a file of patterns to a portion of it, and dispatches the message
based on the results. It exactly replaces the generic queuing
command qer(8) that is executed from the rc(1) script /mail/lib/qmail
in the mail processing pipeline. Associated
with each pattern is an action in order of decreasing priority: dump the message is deleted and a log entry is written to /sys/log/smtpd hold the message is placed in a queue for human inspection log a line containing the matching portion of the message is written to a log If no pattern matches or only patterns with an action of log match, the message is accepted and scanmail queues the message for delivery. Scanmail meshes with the blocking facilities of smtpd(6) to provide several layers of filtering on gateway systems. In all cases the sender is notified that the message has been successfully delivered, leaving the sender unaware that the message has been potentially delayed or deleted.
Scanmail accepts the arguments of qer(8) as well as the following:
–p filename Read the patterns from filename rather than /mail/lib/patterns. –q holdroot Queue deliverable messages in subdirectories of holdroot. This option is the same as the –q option of qer(8) and must be present if the –h option is given. –s Save deleted messages. Messages are stored, one per randomly–named file, in subdirectories of /mail/queue.dump named with the date. –t Test mode. The pattern matcher is applied but the message is discarded and the result is not logged. –v Print the highest priority match. This is useful with the –t option for testing the pattern matcher without actually sending a message.
Testscan is the command line version of scanmail. If filename
is missing, it applies the pattern set to the message on standard
input. Unlike scanmail, which finds the highest priority match,
testscan prints all matches in the portion of the message under
test. It is useful for testing a pattern set or implementing a
personal filter using the pipeto file in a user's mail directory.
Testscan accepts the following options:
Canonicalization
Pattern Syntax
Lines beginning with * contain a pattern–spec that is a string;
otherwise, the the pattern–spec is a regular expression in the
style of regexp(6). Regular expression matching is many times
less efficient than string matching, so it is wiser to enumerate
several similar strings than to combine them into a regular
expression. The action is a keyword terminated by a : and separated
from the pattern by optional white–space. It must be one of the
following: Patterns are accumulated into pattern sets sharing the same action. The matching engine applies the dump pattern set first, then the header and hold pattern sets, and finally the line pattern set. Each pattern set is applied three times: to the canonicalized command line, to the message header, and finally to the message body. The ordering of patterns in the pattern file is insignificant. The pattern–spec is a string of characters terminated by a newline, # or override indicator, ~~. Trailing white–space is deleted but patterns containing leading or trailing white–space can be enclosed in double–quote characters. A pattern containing a double–quote must be enclosed in double–quote characters and preceded by a backslash. For example, the pattern "
The structure of the pattern file and the matching algorithm define the strategy for detecting and filtering unwanted messages. Ideally, a hold pattern selects a message for inspection and if it is determined to be undesirable, a specific dump pattern is added to delete further instances of the message. Additionally, it is often useful to block the sender by updating the smtpd control file. In this regime, patterns with a dump action, generally match phrases that are likely to be unique. Patterns that hold a message for inspection match phrases commonly found in undesirable material and occasionally in legitimate messages. Patterns that log matches are less specific yet. In all cases the ability to override a pattern by matching another string, allows repetitive messages that trigger the pattern, such as mailing lists, to pass the filter after the first one is processed manually. The –s option allows deleted messages to be salvaged by either manual or semi–automatic review, supporting the specification of more aggressive patterns. Finally, the utility of the pattern matcher is not confined to filtering spam; it is a generally useful administrative tool for deleting inadvertently harmful messages, for example, mail loops, stuck senders or viruses. It is also useful for collecting or counting messages matching certain criteria. |
FILES
/mail/lib/patterns default pattern file /sys/log/smtpd log of deleted messages /mail/log/lines file where log matches are logged /mail/queue/* directories where legitimate messages are queued for delivery /mail/queue.hold directory where held messages are queued for inspection /mail/queue.dump/* directory where dumped messages are stored when the –s command line option is specified. /mail/copy/* directory where copies of all incoming messages are stored. |
SOURCE
/sys/src/cmd/upas/scanmail |
SEE ALSO
mail(1), qer(8), smtpd(6) |
BUGS
Testscan does not report a match when the body of a message contains
exactly one line. |