Specification v1
----------------

(c) 2006 Kai Hildebrandt <kai.hildebrandt -AT- web.de>


--------------------------------------------------------------------------------

I. Abstract

This document describes the filter behavior of Murx, the configuration and its
default values. It gives also information of how Murx reacts in case of errors
or user interaction.

This document describes the behavior for upcoming Murx v1.0


--------------------------------------------------------------------------------

II. Goal Of This Document

Goal of this document is to specify clearly what you have to expect when
running Murx and make a detailed testing of all features possible. In parallel
to writing this document, I create an automated test environment to ensure that
Murx reacts like described here. These tests will be done everytime before a
new release to ensure that the code changes do not have side effects on other
functions.


--------------------------------------------------------------------------------

Contents

I. Abstract

II. Goal Of This Document

1. Configuration File

 1.1  Name and Default Position of Configuration File

 1.2  Configuration Format
   1.2.1  The Configuration File Parser
   1.2.2  Splitted Configuration Files
     1.2.2.1  Maximum Include Depth

 1.3  Global Values And Its Default Values
   1.3.1  Account Defintions
     1.3.1.1  Minimum Account Definition
     1.3.1.2  Complete Account Definition
   1.3.2  Handling Of Bodylines
   1.3.3  Duplicate Messages
   1.3.4  Defining Location Of Header File
   1.3.5  Highscore
   1.3.6  Case-Sensivity
   1.3.7  Defining Location Of Log File
   1.3.8  Defining The Appropriate Loglevel
   1.3.9  Maximum Line Length
   1.3.10  Global Size Limitations
     1.3.10.1  Maxsize Allow
     1.3.10.2  Maxsize Deny
   1.3.11  Defining Location Of Netlog File
   1.3.12  Defining Action For Non-RFC2822-Conformant Messages
   1.3.13  Normalization Of Message Subject
   1.3.14  Global Test Mode
   1.3.15  Global Network Timeout
  
 1.4 Filters
   1.4.1  Actions
   1.4.2  Rules
      1.4.2.1  Using Multiple Rules

 1.5 Rules
   1.5.1  Regex Patterns
     1.5.1.1  Equal
     1.5.1.2  Not Equal
   1.5.2  Local Size Limitations
     1.5.2.1  Minsize - Greater Than
     1.5.2.2  Maxsize - Less Than
   1.5.3  Matching Body Lines

2. Runtime Behavior

  2.1  Command Line Options
    2.1.1  Overwriting Global Values
    2.1.2  Return Value
    2.1.3  Check The New Configuration

  2.2  Filter Behavior
    2.2.1  The RFC822-Message Parser
    2.2.2  Sequence Of Filtering
    2.2.3  Examples


3. Errors And User Interaction

  3.1  Misconfiguration
    3.1.1  Included File Not Found
    3.1.2  Incomplete Account Definition
    3.1.3  Empty String or Path
    3.1.4  Invalid Value
  3.2  Network Errors
    3.2.1  Network Timeout
    3.2.2  No Service Listening, TCP Reject
    3.2.3  Listening Service Is Not POP- or IMAP-Server
  3.3  Malformed Messages
  3.4  User Abort
    3.5  Interrupt via CTRL-c


--------------------------------------------------------------------------------

1. Configuration

 1.1  Name and Default Position of Configuration File

The configuration file .murxrc is expected in the home-directory of the user
who called Murx. The runtime options -M or --murxrc overwrite this default
location and name.


 1.2  Configuration Format

   1.2.1  The Configuration File Parser

While parsing the configuration file, the parser should give information about
errors found in general syntax and malformed regular expressions and abort.


   1.2.2  Splitted Configuration Files

It should be possible to split the configuration file using the INCLUDE
keyword.


     1.2.2.1  Maximum Include Depth

The maximum include depth is one, i.e. that it is not possible to use an
INCLUDE keyword in an included rcfile. If the maximum include depth is
exceeded, the parser should print an appropriate error message and abort.


 1.3  Global Values And Its Default Values

This section describes the global values in the configuration file.


   1.3.1  Account Defintions

An account definition stored information about POP3- or IMAP-servers and all
information necessary to login.


     1.3.1.1  Minimum Account Definition

The minimum definition of an account contains:

- SERVER = "{the server name or IP-address}"

- USER = "{your user name}"

- PASSWORD = "{password}"

All other parameters are optional and default to:

- PROTOCOL = "pop3"

- PORT = 110

  This value depends on the PROTOCOL set:

  - "pop3" : PORT = 110

  - "pop3s": PORT = 995

  - "imap" : PORT = 143

  - "imaps": PORT = 993

- USE_TLS = "no"

- USE_STARTTLS = "no" (only used if USE_TLS = "yes")

- FINGERPRINT = "" (only used if USE_TLS = "yes")


     1.3.1.2  Complete Account Definition

An account definition is considered complete when all of the above keywords
have a user defined value. This is the recommended procedure.


   1.3.2  Handling Of Bodylines

It shall be possible to scan also part of the message body. The amount of body
lines is controlled by the keyword BODYLINES. The default value is 0 i.e. do
not receive any body lines.


   1.3.3  Duplicate Messages

Murx shall store every message id to look for duplicate messages. Found
duplicated can be deleted by setting DELETE_DUPLICATES to "yes". The default
value is "no".


   1.3.4  Defining Location Of Header File

All received message headers (and parts of the message body when BODYLINES is
defined greater than zero) shall be logged to a file specified with
HEADERFILE. The default value is "" (empty) i.e. headers shall not be stored.


   1.3.5  Highscore

When using SCORE filters it should be possible to adjust the HIGHSCORE. If a
message total score exceeds this value it will get deleted. The default value
for HIGHSCORE is 100.


   1.3.6  Case-Sensivity

The global setting for case sensivity shall be configurable with the keyword
IGNORE_CASE. The default value is "yes".


   1.3.7  Defining Location Of Log File

It shall be possible to write all messages of Murx to a specified logfile. The
position of this file can be defined via the keyword LOGFILE. The default value
is "" (empty) i.e. do not write messages to a file.


   1.3.8  Defining The Appropriate Loglevel

The loglevel shall provide 5 different levels of verbosity. The loglevels in
detail:

1 - error

  In case of an error, a message is shown and Murx is stopped at this point,
  e.g. file not found, malformed configuration file, etc.

  In some cases these errors are not written to the logfile, e.g. if the
  logfile can not be created at the given location.

2 - warning 

  Warnings do not lead to program termination, just printing out messages for
  important actions, e.g. network timeout, fingerprint does not match (TLS),
  mail deletion

3 - notice

  Messages with this loglevel should give additional information for actions
  that do not lead to loosing mails

4 - info

  Even more informations, e.g. program start, date and version, some statistics

5 - debug

  Used for debugging, all of the above and even some more messages

6,7 - reserved

  There should be also Special Bits with a range from Bit 3 (8) to Bit 6 (64)
  which could be added to the loglevel for printing out additional
  informations.

8 - Log headers

  All received headers can be stored in a header file defined via HEADERFILE
  keyword. This flag is the trigger to use the file i.e. if this bit is not
  set, the header file should not store any received headers.

16 - Show pattern and matched substring of a header line if filter matched

  With this bit set Murx will print out information about matched filters and
  its rules.

32 - Show pattern and matched substring of a header line even if a filter did
     not match completely

  Additionally to bit 4 (16) above, all rules which matched will be printed,
  even when the filter does not match completely.

64 - Show header line with match

  Print out each header line where a rule matched.


   1.3.9  Maximum Line Length

It shall be possible to define a value for the maximum line length of a
message. This value can be defined with the MAXLENGTH keyword. All messages
should be deleted where a line exceeds the given value.
  This is the first check for a received message thus it gets deleted even if
an ALLOW-filter would have been matched!


   1.3.10  Global Size Limitations

     1.3.10.1  Maxsize Allow

There shall be the possibility to remove big messages when an ALLOW-filter
matched. The value can be given with the keyword MAXSIZE_ALLOW.


     1.3.10.2  Maxsize Deny

The general size limit for messages will affect all messages where no
ALLOW-filter matched. All messages exceeding this value will be deleted if this
size is not overridden by a local size limit in a DENY or MOVETO filter which
does not match. This global limit can be set with the keyword MAXSIZE_DENY.

Example for override behaviour:

Message has 500000 octets, MAXSIZE_DENY is 500000 so the message would normally
get deleted. But there is a DENY rule which matched (the regex):

DENY
{
  <> "..."
  =  "..."
# all above rules matched to a line except this one ...

  SIZE > 1000000

# ... because 500000 does not exceed this value!
}

This filter leads to NOT deleting this mail.


   1.3.11  Defining Location Of Netlog File

It shall be possible to define a location for a netlog file, where all network
communication is written to. The keyword defining this location is NETLOG.


   1.3.12  Defining Action For Non-RFC2822-Conformant Messages

There shall be a RFC2822 conformance check which could lead to different
actions for those malformed messages. The keyword is NON-CONFORMANT and the
actions are DENY or MOVETO {folder}.


   1.3.13  Normalization Of Message Subject

A normalization of the subject line shall be possible. Normalization is the
deletion of any non-alphanumeric character. If NORMALIZE_SUBJECT is set all
patterns shall be checked for the regular subject and the normalized one.


   1.3.14  Global Test Mode

There shall be a possibility for a dry run of Murx. If TEST is set, Murx will
not execute any actions (ALLOW, MOVETO, DENY) so that no mail gets lost.


   1.3.15  Global Network Timeout

It shall be possible to define a global timeout value for network
operations. The keyword for defining this value is TIMEOUT.


 1.4 Filters

A Filter consists of a set of rules and an action which shall be executed when
ALL rules have matched a scanned message.

   1.4.1  Actions

There are four possible actions for a filter:

- ALLOW: the message shall be kept on server

- DENY: the message shall be removed from server

- SCORE: add a score to the message overall score. If the total score is equal
         or greater than a defined highscore the message shall be removed fro
         the server

- MOVETO: the message shall be moved to a specified folder. This option is only
          available on IMAP-accounts. If the specified folder does not exist,
          just keep the message => do not create the folder!

   1.4.2  Rules

There shall be two kind of rules:

- pattern match based ones using regular expressions (regex)

   = "pattern": true if matching a regex in a line of the message header

  <> "pattern": true if NOT matching a regex in all lines of the message header

  For both of these rules a preceeding CASE or NOCASE shall overwrite the
  default behavior defined via IGNORE_CASE.

- size restrictions

  SIZE < value: true for messages not exceeding a defined maxsize

  SIZE > value: true for messages exceeding a defined minsize

  These rules shall overwrite the global MAXSIZE_DENY setting. If all rules
  except a size restriction apply on a message, it will not get deleted when
  the size exceeds the global value defined with MAXSIZE_DENY.

      1.4.2.1  Using Multiple Rules

It shall be possible to use multiple rules for a filter. If all rules match on
a message the action of the filter is applied to the message.


 1.5 Rules

   1.5.1  Regex Patterns

Only extended Regular Expressions can be used for filtering.


     1.5.1.1  Equal

An equal Rule matches if the Regular Expression applies to at least to one
header line.


     1.5.1.2  Not Equal

An not equal Rule matches if the Regular Expression applies to no header line.


   1.5.2  Local Size Limitations

A local size rule matches if a size limit is (not) exceeded.


     1.5.2.1  Minsize - Greater Than

A size limit with greater than sign is true if a message size exceeds the given
value.


     1.5.2.2  Maxsize - Less Than

A size limit with less than sign is true if a message size does not exceed the
given value. 


   1.5.3  Matching In Body Lines

Rules with a preceeding BODY are applied on body lines of a message. For
reading body lines the value of BODYLINES must be greater than zero. The
keyword BODY is only allowed for Regular Expression Rules.


2. Runtime Behavior

  2.1  Command Line Options

There are the following command line options with or without additonal
argument:

  -h, --help                 Display this help information
  -H, --headerfile=FILE      Specify headerfile location
  -L, --logfile=FILE         Specify logfile location
  -M, --murxrc=FILE          Specify rcfile location
  -n, --no-action            Read rcfile and report errors
  -r, --return-value         Return amount of remaining messages on server
  -t, --test                 Simulate filter action (do not delete, move or add score)
  -v, --verbose=LEVEL        Specify level of verbosity
  -V, --version              Display version information


    2.1.1  Overwriting Global Values

The following options overwrite values of the rcfile:

  -H, --headerfile=FILE      overwrites HEADERFILE value
  -L, --logfile=FILE         overwrites LOGFILE value
  -t, --test                 sets global TEST value
  -v, --verbose=LEVEL        overwrites LOGLEVEL value


    2.1.2  Return Value

The option -r|--return-value tells Murx to exit with a value greater zero if
there are still messages on the mail server after filtering. In case there are
no message left or an error occured it returns zero.


    2.1.3  Check The New Configuration

With the option -n|--no-action Murx will parse the rcfile and report errors but
will not start filtering mail accounts. This is for debugging.


  2.2  Filter Behavior

Murx logs into all acounts defined in the rcfile after another and applies all
filters in the following sequence:

 1) Check the RFC2822 conformance, if desired. If a message is not RFC2822
    conformant, set action to DELETE_NONCOFORMANT or MOVETO_NONCONFORMANT.
    (keyword NON-CONFORMANT)

 2) Check the maximum line length of the message. If a line exceeds the given
    value set action to DELETE_MAXLENGTH_EXCEEDED.
    (keyword MAXLENGTH)

 3) Check for duplicate messages. This is done by collecting all Message-IDs
    and set the action for all following messages with the same Message-ID as
    DELETE_DUPLICATE.

 4) Check ALLOW-Filters. If a filter matches set action ALLOW and stop here.

 5) Check MOVETO-Filters. If a Filter matches set action MOVETO and stop here.

    Exception: All Rules matched except one or more local size limitations
               In this case set action NORULE and stop here.

 6) Check DENY-Filters. If a Filter matches set action DELETE and stop here.

    Exception: All Rules matched except one or more local size limitations
               In this case set action NORULE and stop here.

 7) Check all SCORE-Filters. If the message score exceeds the HIGHSCORE set in
    rcfile set action to SCORE_DELETE.


    2.2.1  The RFC822-Message Parser

The RFC822 Message Parser reads a message and checks the header for RFC2822
conformance. Currently a mail header is checked for:

 - From and Date exist

 - Message-ID, From, To, Cc, Date and Subject are unique


    2.2.2  Sequence Of Filtering

See "2.2 Filter Behavior above".


    2.2.3  Examples

This is a placeholder for examples to the filtering behavior described
above. Examples make complicated things easy to understand. ;-)


