Introduction

A friend of mine [1] runs a small record label, named Flemish Eye [2]. He's a great designer and understands PHP well enough to do most things, but I help him with anything that falls out of his expertise. He recently approached me with a request to build a secure download section.

Licensing

I just got asked about the licensing of this code. Good question. I'll place it in the public domain. Do what you will, but don't expect any support from me. ;)


This work, including the source code, documentation
and related data, is placed into the public domain.

The original author is Lakin Wecker.
See http://lakin.weckers.net/code/web/apache-mod-rewrite-secure-downloads/
for more information.

THIS SOFTWARE IS PROVIDED AS-IS, WITHOUT WARRANTY
OF ANY KIND, NOT EVEN THE IMPLIED WARRANTY OF
MERCHANTABILITY. THE AUTHOR OF THIS SOFTWARE
ASSUMES _NO_ RESPONSIBILITY FOR ANY CONSEQUENCE
RESULTING FROM THE USE, MODIFICATION, OR
REDISTRIBUTION OF THIS SOFTWARE.

Comments

If you'd like to comment, feel free to do so on my blog post. http://www.abetterkindofangry.com/2008/07/building-modsecdownload-using-apache.html

The Idea

The basic idea is that wants to provide a free download of the songs in an upcoming record in mp3 format to whomever purchases the record. So he would generate a set of secret codes using said tool and then print cards to be included in the record case. The cards would have the necessary URL and code to find and make use of the free download. The purchaser would get 3 downloads opportunities with each record.

The problem

He currently has a shared 2 account on Webfaction [5]. Among other things this means that he has a limited amount of memory that his application server is allowed to use - 80MB. Webfaction  allows hosting of django apps with mod_python using a proxied internal apache instance for the application. The internal apache instance counts against a users memory usage.

The control panel (by default) sets the internal apache process limit to two, (this effectively reduces the active connections to two as well). Now, this isn't terribly efficient if the internal apache is also tasked with serving static media. Webfaction  provides a very easy to use way to allow their main apache server to serve your static media. A few appropriately mounted symlink apps usually do the trick - (IE, use their control panel to make a symlink app from /path/to/your/apps/htdocs/ and mount it at /static and then refer to all static media by prefixing the path with /static/). If your internal apache is only ever tasked with serving the dynamically generated data, then a site can be very snappy with this setup for low to medium traffic.

The first solution

The straightforward (but not terribly efficient) solution is to use the application framework to serve the secure downloads. The problem with this approach is that typically, the webserver instance (or thread) that runs your web application framework uses more memory than straight apache or lighttpd do. So it is expensive in terms of server performance to do this. In addition, on Webfaction's setup each downloader would tie up one of your internal apache instances for the duration of the download. If, like most users of Webfaction, you have limited the internal apache instance to 2 connections, then two downloaders would effectively bring down the site. Not a good solution. I needed to do better.

lighttpd and mod_secdownload

lighttpd [3] and mod_secdownload provide an easy way to solve this exact problem. The basic idea is that lighttpd and the web application share a secret. The combination of this secret, the time of the request and the name of the file allow lighttpd and the web application to both generate the same unique key which represents the requested download. Because of the secret, no-one else can feasibly guess or generate the same key. So, the web application can check to ensure the user has the rights to the file and then it generates a URL that includes the filename, the time of the request and the md5 sum, and it redirects the user using that URL. lighttpd can check to make sure that this is valid by regenerating the same key using the secret. If they match, the user gets access to the download, if they don't ... I'm not sure what happens if they don't, I would hope an HTTP 403 gets returned.

Webfaction  uses apache for their main server on each machine. Although I could install lighttpd myself, each instance of lighttpd would count against my RAM usage and although it does improve the situation, I would have to limit the lighttpd connections and this would result in effectively the same problem. A limited very small number of concurrent downloads would be allowed.

apache and mod_rewrite

So, after some thought I found a solution that uses apache and mod_rewrite [6]. Remember that static applications are served from Webfaction's main apache instance, and therefore requests handled in that manner do not count against the account's total RAM usage. I only needed to secure the downloads based on the codes and remaining uses for each code.

Enter mod_rewrite. mod_rewrite can deny requests to certain resources if some criteria are not met. For instance, requests can be denied if the request does not happen at the correct time [7]. It also can deny access if the URL doesn't contain a specific query string. IT can't however, make database queries and check to ensure that the user is using a (still) valid code.

The key to this idea is realizing that the web application can generate mod_rewrite rules on the fly. In other words, our dynamic application can manage the document root of the static application to restrict access to very specific conditions. First off, the secure file isn't included in the document root. So by default, there is no way to download it. The webapp dynamically inserts symlinks to the file in a protected manner. These symlinks will be inserted into new directories after the directories have been protected with an .htaccess file which includes rules to restrict access. In my case, the webapp randomly generates a secret code which it inserts into the mod_rewrite rules. The rules also restrict access to a fixed window of time - a max of 2 minutes after the request was made. Finally, the webapp can redirect the users request to a unique URL that includes a random code, and which only serves up the file using that secret code for a limited amount of time.

This combination of factors ensures that the access is as secure as the mod_secdownload. In fact someone would have to figure out quite a number of things to effectively steal the file. They might as well use social engineering at that point. Now, there is the possibility that someone could guess the random code (and type in) the right URL combination within those two minutes, but it is highly unlikely. They'd probably have a better chance at guessing the download code that was distributed with the record.

Some more Details and code

In the download view of my application, the following happens -

  1. Use the 'download code' and the 'number of uses', generate a unique path within this static app's document root.
  2. Using os.open with the os.O_EXCL | os.O_CREAT flags, create and open a LOCK file that matches the unique path (with a .LOCK appended) and fail if the file already exists. This ensures that the following steps ONLY happen once.
  3. At this point we know we have exclusive access.
  4. Use os.mkdir to create a directory that lives at the unique path generated in step 1.
  5. Generate a random code.
  6. Generate a timestamp for this minute and a timestamp for the next minute.
  7. Create a .htaccess file in the download directory that restricts access based on the time and the use of the secret code in the query string.

Python Code

HTACCESS_TEMPLATE = """
Options -Indexes
RewriteEngine On

RewriteRule ^.htaccess  - [F]

RewriteCond %%{TIME} !%(min_time)s
RewriteCond %%{TIME} !%(max_time)s
RewriteRule ^.*  - [F]

RewriteCond %%{QUERY_STRING} !^%(secret_code)s$
RewriteRule ^.*  - [F]"""


# Now we generate all of the paths that we need.
directory_basename = '%s-%d' % (code.code, code.uses,)
lock_basename = '%s-%d.LOCK' % (code.code, code.uses,)

# The directory that will contain the htaccess
directory_path = os.path.join(settings.SECURE_ROOT, directory_basename)

# The lock file that ensures only one thread actually can generate
# all of these paths.
lock_path = os.path.join(settings.SECURE_ROOT, lock_basename)

# The path to the htaccess file that we'll use.
htaccess_path = os.path.join(directory_path, '.htaccess')

# Look at the uploaded file and ensure that we name it properly.
file_abspath = code.file.get_file_filename()
file_basename = os.path.basename(file_abspath)

# The symlink to the file that's hosted at the base of the secure path.
symlink_path = os.path.join(
    settings.SECURE_ROOT,
    directory_basename,
    file_basename)

try:
    fd = os.open(lock_path, os.O_EXCL | os.O_CREAT)
except OSError:
    # We're not the first ones!
    return None

# Now we have exclusive rights to make the download directory
os.mkdir(directory_path)

min_time = datetime.now()
max_time = datetime.now() + timedelta(minutes=1)

# Generate a random code for this one use
secret_code = "".join([random.choice(string.letters) for x in range(10)])
template_vars = {
    "secret_code": secret_code,
    "min_time": min_time.strftime("%Y%m%d%H%M.*"),
    "max_time": max_time.strftime("%Y%m%d%H%M.*"),

}

htaccess_content = HTACCESS_TEMPLATE % template_vars
fd = open(htaccess_path, "w")
fd.write(htaccess_content)
fd.close()

# Finally, we make the symlink AFTER writing the .htaccess
# in order to make sure that someone can't just luckily time the request
# to occur in between the symlink and .htaccess creation.
if not os.path.exists(symlink_path):
    os.symlink(file_abspath, symlink_path)

# We got here, so they must be allowed to access the file
# make a record of the access and serve it to them
code.uses += 1
code.save()

return HttpResponseRedirect(
    '/secure/%s/%s?%s' % (directory_basename, file_basename, secret_code))



Example htaccess

Options -Indexes
RewriteEngine On

RewriteRule ^.htaccess  - [F]

RewriteCond %{TIME} !200807101941.*
RewriteCond %{TIME} !200807101942.*
RewriteRule ^.*  - [F]

RewriteCond %{QUERY_STRING} !^KqjPCBjkIt$
RewriteRule ^.*  - [F]


Gotchas

Be careful with the symlink you create for the applications static media, you may inadvertantly expose the protected file in that manner. To work around this, you can add a .htaccess file to the protected file folder that denies all access. :)

References

  1. http://charmedbirds.com/
  2. http://flemisheye.com/
  3. http://www.lighttpd.net/
  4. http://trac.lighttpd.net/trac/wiki/Docs:ModSecDownload
  5. http://www.webfaction.com/services/hosting?affiliate=lakin
  6. http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html
  7. http://www.askapache.com/htaccess/time_hour-rewritecond-time.html
  8. http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#RewriteCond