
      /\
     /  \            (C) Copyright 2006 Parliament Hill Computers Ltd.
     \  /            All rights reserved.
      \/
       .             Author: Alain Williams, <addw@phcomp.co.uk> April 2006
       .
        .            SCCS: @(#)Documentation	1.9 05/09/11 17:15:14
          .
 



Overview
********

MailToUrl is to help those people who send email with large attachments - which is a bad idea.
Instead they send mail to a MailToUrl mailbox that saves the attachments and replies with
a URL that can be pasted into email that is sent. The mail recipients can, if they wish,
read the attachments by clicking on the URL.

Why not send out large attachments ?
************************************

* It clogs up mail servers - especially if it is sent to a popular mail list
* It fills up mail boxes - many people have quota limits on their mail boxes
* It can cost people money to download large email - many still use slow modems
* Some (corporate) sites filter out many attachments - so it never gets there
* Some users can't read attachments anyway (think Blackberry users)


Summary of features
*******************

* Receive mail with an attachment and reply with a URL
* The attachments will be automatically removed from the web/ftp site after a specified
  time; default 1 month
* Auto delete (purge) attachments (files) beyond an expiry date
* Users can be warned a settable number of days before their files will be purged
* Different attachments can have different expire times
* Allow the original sender of an attachment to remove (delete) the attachment
* Allow users to list what files are in the archive
* Allow users to remove files that they put up
* Enforce minimum and maximum attachment sizes
* Enforce a quota -- maximum amount of disk space used by all of the attachments
* Attachment names are filtered to avoid problems that strange characters can cause
  Alpha Numeric and '.' and '-' (no spaces), max 32 characters
* Nominated administrators can, via email, modify: file size parameters; file retention
  dates; maximum file name; remove any file and add/remove users of the service
* The service is restricted to a list of email addresses
* The list of users can be be built automatically, eg from members of a mailman list
* Activity is recorded in a log file


The big picture of how MailToUrl is set up
******************************************

You need to be able to:

a) automatically intercept email for a user and filter it through MailToUrl
b) have control of a Web or FTP server

1) You set it up on a server with a good Internet connection
2) You choose if you want to generate FTP (anonymous File Transfer) URLs
   or HTTP (World Wide Web) URLs.
   A URL is an Internet address for a document or file.
3) You set up a directory where MailToUrl can create files.
4) Choose an email address where users can send to.
5) You arrange for email for the address to be processed by MailToUrl.
6) Create a .list file in the archive directory
7) Arrange for MailToUrl to be run with the purge option once a day.

MailToUrl is written in the Perl scripting language.

The above is easy on a Unix or Linux system, let me know if you achieve
this on anything else.


Quick Start
***********

To make a responder at easy@files.example.com that issues ftp URLs.
Change the references to example.com to what is appropriate.

1) Update DNS so that there is an A and a MX record that points to your machine. Eg:
	easy	IN	A	192.168.10.20
		IN	MX 10	easy

2) Install MailToUrl as /usr/local/bin/MailToUrl, make it executable

3) Create some ftp that are accessible but not browsable:
	mkdir      /var/ftp/files /var/ftp/files/easy
	chmod 751  /var/ftp/files /var/ftp/files/easy
	chown mail /var/ftp/files /var/ftp/files/easy

4) Create a .list file (note that one tab character MUST separate the directive and value):
	( echo -e '.ProtoRoot\tfiles/'
	  echo -e '.Domain\tfiles.example.com'
	  echo -e '.Protocol\tftp'
	  echo -e '.Admins\tsomeone@example.com'
	  echo -e '.UserFile\t/etc/exim/file_domains/easy'
	) > /var/ftp/files/easy/.list
	chmod 600  /var/ftp/files/easy/.list
	chown mail /var/ftp/files/easy/.list

5) Configure who may use the facility:
	mkdir /etc/exim/file_domains
	( echo someone@example.com
	  echo a.user@somewhere.else.com
	) > /etc/exim/file_domains/easy
	chmod 644 /etc/exim/file_domains/easy
	chown mail /etc/exim/file_domains/easy

6) Add entries to the router and transports section of your exim configuration file
   (probably /etc/exim/exim.conf):
   The router:
	# This is a domain where attachments will be put up for web/ftp serving
	# The local part is used to determine what organisation this is being done for,
	# each org will have a separate directory.
	# The sending user must be listed in the file for the local user.
	mail_to_url_router:
	  driver = accept
	  domains = files.example.com
	  require_files = /etc/exim/file_domains/$local_part
	  senders = lsearch;/etc/exim/file_domains/$local_part
	  transport = mail_to_url_transport
	  no_more

   The transport:
	# Mail sent to this address is piped through MailToUrl
	mail_to_url_transport:
	  driver = pipe
	  command = /usr/local/bin/MailToUrl $local_part $sender_address $header_subject /var/ftp/files/$local_part
	  user = mail
	  group = apache
	  return_fail_output = true

7) Add to root's crontab:
	30      4       *       *       *       /usr/local/bin/MailToUrl --purge easy /var/ftp/files/easy

8) Review the .UserFile line in /var/ftp/files/easy/.list once you have this working.


Security Warning
****************

Simple authentication is done using the email sender address, this is notoriously
easy to forge. The authentication is used to only permit the original author of a
file to delete it before it's expiry date. It is possibly that a malicious person
could remove the original content and replace it with something else.

This program should NOT be used in situations where a high confidence in the
assured contents of the files is required.

The original author will receive a confirmation mail informing him of the file
delete/replacement. This will hopefully prompt him to investigate what has happened.

It may be possible to increase sender authentication using something like a SPF
check in the MTA.



Choosing URL type
*****************

The obvious URL type is HTTP (Web protocol). Most users will be able to download
HTTP using their desktop browser. Browsers should also support FTP, however some
corporate firewalls may try to block this.

There is a potential security risk with HTTP for the web server machine. MailToUrl
allows all file types to be uploaded, unless the web server is properly configured
it might interpret one of these files as a web scripting program and try to run it,
eg: prog.php, prog.cgi, prog.pl. There are also cross site scripting dangers.
A FTP server just serves up files, no special interpretation is applied.

If you are concerned about safety I suggest choosing FTP.


Web Server Configuration
************************

If you do want to use a web server the following may help. It is an apache virtualhost
definition. The important bit is 'AddType' which overrides the normal interpretations
of problematic file types. This may not be sufficient, take care:

<VirtualHost files.example.com>
     ServerAdmin root@example.com
     DocumentRoot /var/www/files
     ServerName files.example.com:80
     ErrorLog logs/files/error_log
     CustomLog logs/files/access_log common
     Options FollowSymLinks -ExecCGI -IncludesNOEXEC
     # Stop PHP scripts, etc, being executed:
     AddType text/plain .php .pl .cgi .html .shtml .htm .xhtml .xht .css .sgml .sgm .xml .xsl
</VirtualHost>


FTP Server Configuration
************************

There are so many different FTP servers out there that it is not possible to give
sample configuration files. However there are a few things that you should think
about.

* Do you want anonymous ftp access. If 'yes' then anyone who knows a URL can download
  a file. If 'no' you need to distribute a username & password to the mail
  recipients.

* Do you want the directories to be browsable (with dir) ? If you allow it then
  anyone who discovers the web site can easily see, and download, all of your
  documents. Disabling directory listings helps privacy - but does not guarantee it.

* Do you want visiting users who download files to be able to upload other files ?
  Probably not, so disable remote uploading to these directories.

* You probably don't want the .list file readable to the ftp user.

Integration with Email
**********************

You need to arrange for email for an appropriate user (or group of users) to be
passed to MailToUrl. You need to give as arguments to MailToUrl:

* mailbox name - IE the local part of the mail address that received the attachment
* sender - who sent the email
* subject - the Subject header from the email
* Attachment storage directory name - optional, if the default is not used.

Exim
****

Parts of the exim MTA (http://www.exim.org/) configuration file follow.
This file will probably be called /usr/exim/configure or /etc/exim/exim.conf
This is for exim version 4.

This is where attachments are sent to save@files.example.com

In the routers section:

	# This is a user where attachments will be put up for web serving
	mail_to_url_router:
	  driver = accept
	  local_parts = save
	  domains = files.example.com
	  transport = mail_to_url_transport
	  no_more

In the transports section:

	# Mail sent to this address is piped through MailToUrl
	mail_to_url_transport:
	  driver = pipe
	  command = /usr/local/bin/MailToUrl $local_part $sender_address $header_subject
	  user = mail
	  group = apache
	  return_fail_output = true

Notes:

* You have put MailToUrl in /usr/local/bin/
* You may nee to change the user or group as appropriate
* You may need to add something to the ACL to allow mail to be received for this mailbox.
* The configuration relies on exim sanitising $local_part - MailToUrl only allows it to
  contain alphanumeric characters. Do NOT remove this restriction, if it were to
  contain '/' this might allow unwanted access to other directories.


Here is a more complicated version.
* Mail can be sent to several users @files.example.com, each of which
  is configured separately
* Who can use the service is restricted

The users allowed to use a service are listed in a file, eg:
for archive@files.example.com there is the file /etc/exim/file_domains/archive:
	.dir /var/ftp/files/archive
	john@example.com
	bilbo@shire.com
(The .dir line may be needed, see later)
There will be one such file for each mailbox in the domain files.example.com

The router:
	# This is a domain where attachments will be put up for web/ftp serving
	# The local part is used to determine what organisation this is being done for,
	# each org will have a separate directory.
	# The sending user must be listed in the file for the local user.
	mail_to_url_router:
	  driver = accept
	  domains = files.example.com
	  require_files = /etc/exim/file_domains/$local_part
	  senders = lsearch;/etc/exim/file_domains/$local_part
	  transport = mail_to_url_transport
	  no_more

If a sender is not listed in /etc/exim/file_domains/$local_part mail will be bounced.

If the domain accepts regular mail as well, you should replace the senders line thus:
	  senders = ${if exists {/etc/exim/file_domains/$local_part}\
	              {lsearch;/etc/exim/file_domains/$local_part}{*}}



We allow each mailbox to have a separate directory, this is looked up in the .dir line
in the sender validation file.

The transport:
	# Mail sent to this address is piped through MailToUrl
	mail_to_url_transport:
	  driver = pipe
	  command = /usr/local/bin/MailToUrl $local_part $sender_address $header_subject /var/ftp/files/$local_part
	  user = mail
	  group = apache
	  return_fail_output = true

For a bit of extra flexibility the directory may be obtained from the allowed emails file.
Create a .dir line (see above) and modify the transport command line to:

	  command = /usr/local/bin/MailToUrl $local_part $sender_address $header_subject ${lookup{.dir}lsearch{/etc/exim/file_domains/$local_part}}


Procmail
********

It should be possible to run MailToUrl from procmail.
Please send me the recipe if you do this.


The .list file
**************

This file is in the archive directory and contains:

* A list of the files in that directory, who send them there and when the files expire.
  The fields in this line are:
	Filename	EmailOfOwner	ExpireTime-Integer	ExpireTime-Text
  eg:
	notes.pdf	addw@phcomp.co.uk	1148228772	Sunday 21 May 2006
  These lines are added/removed by MailToUrl.
* Configuration directives. All of these have names that start with a '.'
  and they have one value in the second field, eg:
	.Quota	20480

Note that the fields in this line are separated by tab characters, if you edit this
file by hand you MUST preserve the distinction between spaces & tabs.

This file must be writable by MailToUrl, you probably do not want it readable by the
web or ftp server.

A good starter for this file for files@files.example.com might be (indented):
	.ProtoRoot	files/
	.Domain	files.example.com
	.Protocol       ftp
	.MaxSize        4000000
	.Admins	someone@example.com

To allow list admins to change the users of the service add:
	.UserFile	/etc/exim/file_domains/files

Configuration directives
************************

These are found in the .list file in the archive directory, they are:


.Protocol
	The protocol that will be used to serve the file, this makes up
	the first part of the URL, default: 'http'

.Domain
	The domain (machine name) that will be given in the URL, default
	the server's host name.

.ProtoRoot
	A path that needs to be prepended to the mailbox name (email local part)
	to get when accessing it via the protocol, default ''.

.MaxSize
	Files larger than this will not be archived. Units: Kilobytes
	Default: 2 Megabytes.
	WARNING: Prior to version 1.14 the unit was bytes.

.MinSize
	Files smaller than this will not be archived. Units: Kilobytes
	Default: 2 Kilobytes.
	WARNING: Prior to version 1.14 the unit was bytes.

.FileDir
	The archive directory (where files are served from). This allows the
	.list file to be in a separate directory. Default: $FileDirRoot/$mailbox
	$FileDirRoot is /var/www/files
	This may also be specified as the optional last parameter to the
	MailToUrl command line.

.Quota
	The maximum size that the archive may be, units Kilobytes, default 2048
	(IE 2 Megabytes). If this is 0 no quota limit is enforced.

.KeepDays
	The default number of days for which a file will be kept. Default: 31

.WarnPurge
	A file owner can be warned several days in advance that a file will be removed.
	Set this to the number of days, 0 switches this off. The owner will only be
	sent one warning, the way that this is done assumes that the purge run
	is done once a day.
	Default: 0.

.SayPurged
	If set a file owner will be told if a file of their's has been purged.
	If present the value should be numeric. A value of '0' will mean that the
	owner will not be told, any other value result in email being sent to the
	file owner.
	Default: 0.

.MailFrom
	The envelope from address for mail sent to the attachment sending user.
	This should not be the mailbox, if it were there is a risk of a mail loop
	where machines bounce mail between them.
	Default: $mailbox-bounce@$domain

.MailReplyTo
	The Reply-To: address in the header. The default is the receiving mailbox,
	so that when (not 'if') a user 'replies' to email it comes back to MailToUrl
	for processing.

.LogFile
	Where a record of transactions and actions is kept.
	Default: /var/log/$ProgramName, ie /var/log/MailToUrl

.WorkDir
	A temporary directory where files are created. Default: /var/tmp

.Admins
	A tab separated list of email addresses from which administrative commands
	will be accepted.

.UserFile
	This enables the 'admin user' commands, it is optional.
	The file that contains the list of users of this service, the file should
	be readable by the mail user, if 'admin user add' and 'admin user del' are
	to be allowed it should also be writable.
	This does not play well with the MailToUrlUsers utility.
	Default: unset
	Eg:
		.UserFile	/etc/exim/file_domains/some_list

.UserReadOnly
	This does not take a value, if present any file listed in .UserFile will
	not be written to.

It might help if you know that the URL generated is:

	$protocol://$domain/$ProtoRoot$mailbox/$AttachmentName

The $mailbox is the local part of the email address.



MailToUrl Command Line
**********************

Help:
	MailToUrl --help
Remove expired attachments from Web/Ftp page:
	MailToUrl --purge mailbox [File Directory]
Upload new attachments, mail on stdin:
	MailToUrl mailbox sender subject_command [File Directory]



The email Subject Line
**********************

This may be used by the sending to specify options or give simple commands:

on uploading an attachment to be saved as a file, how long before it is removed, eg:
	keep 2 days
	keep 2 weeks
	keep 1 months
	keep 1 year
The original sender of a file can send a replacement file, or perhaps the
same file with a different keep date. Only the original sender can update
the file.

to see what files are present:
	list

to remove a file called FileName:
	delete FileName
Only the original sender of the file can delete the file.

Administrators admin subject line
*********************************

Those listed in the .Admins directive in the .list file may also issue:

Get help:
	admin help

Set options and parameters:
	admin set parameter value

The parameters that may be set are: KeepDays MaxSize MinSize Quota SayPurged WarnPurge
These are those in the .list file described above.

Delete a file, overriding ownership:
	admin delete FileName

Change the keep time for a file currently kept (counting from when the command is issued):
	admin keep FileName period
eg:
	admin keep report.pdf 2 weeks

User commands deal with those who may use the service, to enable these the .UserFile directive
should be given in the .list file. The commands are:
    show who is able to use the service:
	admin user list

    Add and remove users from the list, for this the file should be writable and .UserReadOnly
    not set. Lines starting '.' will not be changed, the file will be kept sorted:
	admin user add local_part\@domain ...
	admin user del local_part\@domain ...
eg:
	admin user add addw\@phcomp.co.uk info\@phcomp.co.uk

Purging
*******

The easiest way to do this is to run it from cron:

	# Purge expired files from the mail file store:
	30      4       *       *       *       /usr/local/bin/MailToUrl --purge film_club
	32      4       *       *       *       /usr/local/bin/MailToUrl --purge chess_club /var/ftp/files/chess_club

The second one specifies the archive directory since it is not in the default location (under /var/www).


Automatic Maintenance of Authorised Users
******************************************

The MailToUrlUsers script can be used to maintain the list of who
is allowed to send attachments. One example of this is extracting a list of
mail addresses from a mailman list, thus all list members can use the facility.

A suitable cron entry might be:

	20	3	*	*	*	/usr/local/mailman/stalbans-go/bin/list_members chat | MailToUrlUsers /etc/exim/file_domains/go

MailToUrlUsers also takes a -e option where you may specify a file that contains
extra users.

If you allow administrators to change the membership with 'admin user add' or 'admin user del'
their changes will be lost when this run.


File system location and permissions
************************************

The directory containing files that list users of the service is given as /etc/exim/file_domains/$local_part,
another good choice might be /var/spool/exim/file_domains/$local_part. The files in this directory must
be readable by the MUA (eg exim). In this case the permissions should be for the user 'exim',
check also the user in 'mail_to_url_transport'.

The FTP area should be readable by the anonymous FTP user, a good place to put these would be
	/var/ftp/files/$local_part/
(ie a separate directory for each address).
The directories should be writable by the MTA (or whatever user is set up in the exim transports section).
You probably don't want these files and directories to be searchable by the anonymous ftp user, you
achieve this my making the directories executable but not readable to that user, eg mode 751. Eg:
	# l -a /var/ftp/files
	total 22
	drwxr-x--x  11 mail root 4096 Jun 29  2007 .
	drwxr-xr-x   4 root root 4096 Nov 17 04:43 ..
	drwxr-x--x   2 mail root 4096 Feb 17 04:30 files
	drwxr-x--x   2 mail root 4096 Jun 29  2007 friends
This means that someone has to be told the URL of a file to be able to download it. By no means
perfect security, but better than nothing.
You might decide that a directory is generally readable.

The default log file is:
	/var/log/MailToUrl
this needs to be writable by the user that the script runs as.

Portability Notes
*****************

MailToUrl is written in Perl, however the 'du' command is used to determine disk usage.

The Perl modules listed below are used, you should already have them installed, except
possibly MIME::Parser.

	MIME::Parser;
	Data::Dumper;
	File::Copy;
	POSIX qw(strftime);
	Fcntl ':flock';
	Sys::Hostname;

