15-Year-Old Python Vulnerability Still Affects Over 350,000 Open-Source Projects

CVE-2007-4559 impacts 350,000 open-source projects and an unknown number of closed-source projects.

September 22, 2022

A vulnerability discovered over 15 years ago still plagues hundreds of thousands of open source projects today, according to Trellix, raising supply chain security concerns. Assigned CVE-2007-4559, the bug was discovered in 2007 and still exists in the tarfile module of Python.

The Trellix Advanced Research Center came across the path traversal attack vulnerability during an investigation into a separate vulnerability. CVE-2007-4559 impacts some 350,000 open-source projects and an unknown number of closed-source projects, escalating fears of software supply chain attacks. According to NCC Group, attacks against organizations in the global supply chain increased by 51% between July and December 2021.

Christiaan Beek, head of adversarial & vulnerability research at Trellix, said, “When we talk about supply chain threats, we typically refer to cyber-attacks like the SolarWinds incident, however building on top of weak code-foundations can have an equally severe impact.”

Besides machine learning, automation applications, and docker containerization, the vulnerable tarfile module of Python is leveraged by AWS, Google, Intel, Facebook, and Netflix for specific frameworks. The tarfile module is the default setting in any project that leverages Python unless manually changed.

“This vulnerability’s pervasiveness is furthered by industry tutorials and online materials propagating its incorrect usage. It’s critical for developers to be educated on all layers of the technology stack to properly prevent the reintroduction of past attack surfaces.”

CVE-2007-4559Opens a new window enables arbitrary code execution. Although its CVSS score of 5.1Opens a new window suggests CVE-2007-4559 is a medium severity vulnerability, Trellix said its exploit is relatively easy and can be exploited with as little as six lines of code.

The tarfile module in Python enables developers to read and write tar archives, which is a UNIX-based utility used to package uncompressed or compressed (using gzip, bzip2, etc.) files together for backup or distribution.

The 2007 path traversal vulnerability exists because of a few “un-sanitized” lines of code in tarfile. The tarfile.extract() and tarfile.extractall() functions are coded without any safety mechanisms that sanitize or review the path supplied to it for file extraction from tar archives.

So when a user passes a TarInfo object while calling these extract functions, it causes directory traversal. In other words, it extracts files from a source specified to it without performing the appropriate safety check.

Trellix Threat Labs vulnerability researcher, Kasimir Schulz, saidOpens a new window , “This vulnerability is incredibly easy to exploit, requiring little to no knowledge about complicated security topics. Due to this fact and the prevalence of the vulnerability in the wild, Python’s tarfile module has become a massive supply chain issue threatening infrastructure around the world.”

See More: Why Software Bill of Materials (SBOM) Is Critical To Mitigating Software Supply Chain Risks

“Not only has this vulnerability been known for over a decade, the official Python docs explicitly warn to ‘Never extract archives from untrusted sources without prior inspection’ due to the directory traversal issue,” notedOpens a new window Charles Mcfarland, vulnerability researcher in Trellix’s Advanced Threat Research team.

Tarfile Extract Warning to Python Developers

Tarfile Extract Warning to Python Developers | Source: TrellixOpens a new window

The number of unique projects/repositories on GitHub that include ‘import tarfile’ in its python code is 588,840. However, 61% of these repositories did not perform cleanup of the tarfile members before being executed, taking the number of vulnerable repositories to 350,000.

Trellix also pointed out that since machine learning tools like GitHub CoPilotOpens a new window are trained on vulnerable GitHub repositories, they “are learning to do things insecurely. Not from any fault of the tool but from the fact that it learned from everyone else.”

Trellix’s analysis of project domains impacted by CVE-2007-4559 revealed the following:

Project Domains Impacted by CVE-2007-4559

Project Domains Impacted by CVE-2007-4559 | Source: Trellix

It should be noted that Trellix’s research on vulnerable projects is limited to GitHub. So it is likely that other projects are also affected by the 15-year-old vulnerability.

The software supply chain can have hundreds of vendors that supply applications, independent code, software, libraries, and other dependencies. When vulnerable dependencies such as the tarfile module are integrated with third-party providers, service providers, contractors, resellers, etc., it expands the attack surface of everyone in the chain while simultaneously weakening the security fabric of even those with appropriate security hygiene practices.

“While we can’t provide as detailed an analysis [of closed-source projects] as we can with open-source projects, it is fair to expect the trend to be similar. What if 61% of all projects — open- and closed-source — could be exploited due to this vulnerability?” asks Douglas McKee, principal engineer and director of vulnerability research for Trellix Threat Labs.

“To do our part Trellix is releasing a script which can be used to scan one or multiple code repositories looking for the presence and likelihood of exploitation for CVE-2007-4559. Additionally, we are working on automating submissions of pull requests to open-source projects which can be confirmed to be exploitable,” McKee added.

Trellix has automated mass repository forking, mass repository cloning, code analysis, code patching, code commits, and pull requests. Patches by the company for 11,005 repositories are ready for pull requests. Trellix is developing patches for more projects.

“The number of vulnerable repositories we found begs the question, which other N-day vulnerabilities are lurking around in OSS, undetected or ignored for years?” McFarland added. “If this tarfile vulnerability is any indicator, we are woefully behind and need to increase our efforts to ensure OSS [open source software] is secure.”

To check if your project/repository is vulnerable to CVE-2007-4559, refer to this GitHub documentationOpens a new window by Trellix.

Let us know if you enjoyed reading this news on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!

MORE ON SOFTWARE SUPPLY CHAIN

Sumeet Wadhwani
Sumeet Wadhwani

Asst. Editor, Spiceworks Ziff Davis

An earnest copywriter at heart, Sumeet is what you'd call a jack of all trades, rather techs. A self-proclaimed 'half-engineer', he dropped out of Computer Engineering to answer his creative calling pertaining to all things digital. He now writes what techies engineer. As a technology editor and writer for News and Feature articles on Spiceworks (formerly Toolbox), Sumeet covers a broad range of topics from cybersecurity, cloud, AI, emerging tech innovation, hardware, semiconductors, et al. Sumeet compounds his geopolitical interests with cartophilia and antiquarianism, not to mention the economics of current world affairs. He bleeds Blue for Chelsea and Team India! To share quotes or your inputs for stories, please get in touch on sumeet_wadhwani@swzd.com
Take me to Community
Do you still have questions? Head over to the Spiceworks Community to find answers.