Zip Slip in NLTK (CVE-2019-14751)

Description

Natural Language Toolkit (NLTK) prior to version 3.4.5 is vulnerable to a directory traversal, allowing attackers to write arbitrary files via a ../ (dot dot slash) in an NLTK package (ZIP archive) that is mishandled during extraction.

Vulnerability Analysis

NLTK data packages provide linguistic data sets for use in natural language processing. These data packages are delivered to NTLK via ZIP archives. The NLTK Downloader implements a custom function, _unzip_iter(), that extracts these ZIP archives.

nltk/downloader.py

def _unzip_iter(filename, root, verbose=True):
    if verbose:
        sys.stdout.write("Unzipping %s" % os.path.split(filename)[1])
        sys.stdout.flush()

    try:
        zf = zipfile.ZipFile(filename)
    except zipfile.error as e:
        yield ErrorMessage(filename, "Error with downloaded zip file")
        return
    except Exception as e:
        yield ErrorMessage(filename, e)
        return

    # Get lists of directories & files
    namelist = zf.namelist()
    dirlist = set()
    for x in namelist:
        if x.endswith("/"):
            dirlist.add(x)
        else:
            dirlist.add(x.rsplit("/", 1)[0] + "/")
    filelist = [x for x in namelist if not x.endswith("/")]

    # Create the target directory if it doesn't exist
    if not os.path.exists(root):
        os.mkdir(root)

    # Create the directory structure
    for dirname in sorted(dirlist):
        pieces = dirname[:-1].split("/")
        for i in range(len(pieces)):
            dirpath = os.path.join(root, *pieces[: i + 1])
            if not os.path.exists(dirpath):
                os.mkdir(dirpath)

    # Extract files.
    for i, filename in enumerate(filelist):
        filepath = os.path.join(root, *filename.split("/"))

        try:
            with open(filepath, "wb") as dstfile, zf.open(filename) as srcfile:
                shutil.copyfileobj(srcfile, dstfile)
        except Exception as e:
            yield ErrorMessage(filename, e)
            return

        if verbose and (i * 10 / len(filelist) > (i - 1) * 10 / len(filelist)):
            sys.stdout.write(".")
            sys.stdout.flush()
    if verbose:
        print()

As can be seen in the above code snippet, the python ZipFile library is used to catalog the content of the ZIP archive (lines 2259–2265). On lines 2280–2286, a write-only file handler, filepath, is created for each file defined in the ZIP archive. Each file is then written on line 2286.

Because no validation is performed on the extraction paths, an attacker can use a crafted ZIP archive with relative paths to write arbitrary files to the filesystem. For example, NLTK Downloader will attempt to extract the data package (ZIP archive) into $HOME/nltk_data/ upon download. If the ZIP archive contains a file named files/../../../../../tmp/evil.txt, this relative path will be resolved as /tmp/evil.txt.

Proof of Concept (PoC)

Normally, NLTK data packages are downloaded and extracted to $HOME/nltk_data. This PoC will show that the NLTK Downloader can be manipulated into writing files to /tmp or any directory where the user has write access. The files for this proof of concept are available at https://github.com/mssalvatore/CVE-2019-14751_PoC.

Step 1: Use a webserver to host index.xml and zipslip.zip.

Step 2: Launch the NLTK Downloader from the Python interactive shell.

PoC step 2

Step 3: The NLTK Downloader needs to be configured to download data packages from your webserver. Use the “Config” option, and change the server URL to point to your webserver.

PoC step 3

Step 4: Install the malicious “zipslip” data package. Upon installation, the ZIP archive will be extracted, and the malicious file will be written to /tmp/evil.txt.

PoC step 4

Verification: Successful exploitation can be verified by checking for the existence of /tmp/evil.txt.

Verification of exploitation

Security Impact

In order to exploit this vulnerability, a user must be tricked into downloading a malicious NLTK data package from a malicious or compromised server. The NLTK data package is delivered via a ZIP archive. Improper handling during the extraction of the ZIP archive can overwrite the user’s configuration files or other files. This could grant the attacker remote access or cause malicious code to be executed. This attack does not allow for privilege escalation and is, therefore, limited by the permissions and capabilities of the user running NLTK Downloader.

Remediation

This vulnerability has been fixed in NLKT v3.4.5 and later. Users are encouraged to upgrade to the latest version. Ubuntu users have access to a patched version of NLTK. Other NLTK users can apply the following patch:

https://github.com/nltk/nltk/commit/f59d7ed8df2e0e957f7f247fe218032abdbe9a10

Disclosure Timeline

A Zip Slip vulnerability in NLTK Downloader is discovered.

NLTK developers are notified of the vulnerability via email and provided with a proof of concept, as well as a recommended patch.

NLTK is patched in the development branch.

MITRE assigns CVE-2019-14751 to this vulnerability.

NLTK 3.4.5 is released.

Further Reading

CVE: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14751

Proof of concept: https://github.com/mssalvatore/CVE-2019-14751_PoC

NLTK 3.4.5 changelog: https://github.com/nltk/nltk/blob/3.4.5/ChangeLog

Snyk Zip Slip whitepaper: https://res.cloudinary.com/snyk/image/upload/v1528192501/zip-slip-vulnerability/technical-whitepaper.pdf