Directory traversal 101
If you develop or maintain any software that allows a user to request files from a server (e.g. a web server), you need to watch out for directory traversal attacks. They work like this: Imagine your server is configured to serve files from the directory /var/srv/my-site
(henceforth known as the “document root”). If someone requests http://HOSTNAME/index.html
, the server knows to search the document root for a file named “index.html” and serve up /var/srv/my-site/index.html
.
What if the attacker requests something more clever than index.html
? For example, what if the attacker requests something like http://HOSTNAME/../../../home/user/passwords.txt
? The ..
is a relative path component that tells the server to move up one directory in the path. If your code isn’t checking for input like this, your server will try and resolve /var/srv/my-site/../../../home/user/passwords.txt
to /home/user/passwords.txt
. The attacker can use relative paths to “break out” of the document root and extract any files that the web server can read.
How not to do it
On more than one occasion, I’ve seen code that looks like this:
import os.path DOCUMENT_ROOT = "/var/srv/my-site" def validate_requested_file_path(requested_path: str): absolute_document_root = os.path.abspath(DOCUMENT_ROOT) absolute_requested_path = os.path.abspath(requested_path) if not absolute_requested_path.startswith(absolute_document_root): raise ForbiddenPathError( f"The requested path '{requested_path}' is forbidden" )
At a glance, this looks good. The requested path and document root are passed to os.path.abspath()
, which resolves any relative components or symlinks and returns an absolute, canonical path. A canonical path is the simplest absolute path to a given file. For example, the canonical path of /var/srv/../file.txt
is /var/file.txt
. Canonicalizing two paths allows them to be accurately compared. Well done, right?
This approach handles most attacks, but there’s an edge case it misses. Imagine that your directory structure is set so that the document root is /var/srv/my-site
and your X.509 certificate is stored in /var/srv/my-site-certificates
.
$ ls -l /var/srv drwxr-xr-x 2 webserver webserver 4096 May 17 2021 my-site/ drwxr-xr-x 2 webserver webserver 4096 May 17 2021 my-site-certificates/
Do you see the issue? If the attacker requests http://HOSTNAME/../my-site-certificates/my-site.key
, the web server will resolve the requested path to /var/srv/my-site-certificates/my-site.key
. The code uses string comparison to check if /var/srv/my-site-certificates
starts with /var/srv/my-site
. Guess what? It does.
A safer way
I’m a big fan of using Path
objects from Python’s pathlib library instead of os.path
functions to handle file paths. As luck would have it, Path
objects offer a safer way to detect attempted directory traversals.
from pathlib import Path DOCUMENT_ROOT = Path("/var/srv/my-site") def validate_requested_file_path(requested_path: Path): if DOCUMENT_ROOT.resolve() not in requested_path.resolve().parents: raise ForbiddenPathError( f"The requested path '{requested_path}' is forbidden" )
First, we call Path.resolve()
for both the DOCUMENT_ROOT
and requested_path
. By canonicalizing the paths, we can have confidence that we’re comparing apples to apples. Next, the Path.parents()
method returns a sequence containing every ancestor directory of requested_path
. Checking that DOCUMENT_ROOT
is an ancestor of requested_path
is more precise than using string comparison and leaves little room for error.
Caveats
Symlinks and exceptions
Path.resolve()
resolves symlinks, which may or may not be desirable for your application. Furthermore, if any infinite loops are encountered while resolving the path, a RuntimeError
will be raised. If either of these behaviors is undesirable, an alternate solution using os.path.abspath
can be seen below. Note, however, that because os.path.abspath
does not resolve symlinks, the paths will be absolute but may not be canonical. In these cases, an attacker may still be able to write files outside of the document root. I would recommend the implementation above for most applications; the implementation below has been added for completeness.
from pathlib import Path from os.path import abspath # WARNING: os.path.abspath does not resolve symlinks DOCUMENT_ROOT = Path(abspath("/var/srv/my-site")) def validate_requested_file_path(requested_path: Path): if DOCUMENT_ROOT not in Path(abspath(requested_path)).parents: raise ForbiddenPathError( f"The requested path '{requested_path}' is forbidden" )
Limitations
It should be noted that either method presented here is only secure if the attacker has no way of manipulating the directory structure on the server. If the attacker can create symlinks or move directories, then more rigorous checks are needed to prevent directory traversal, but that’s a topic for a future post.