- Published on
Reproducibility vs. Provenance: Trusting the JavaScript Supply Chain
- Authors
- Name
- Darcy Clarke
- X (Formerly Twitter)@darcy
The security and trustworthiness of the JavaScript package ecosystem have been under scrutiny for years. With growing concerns over software supply chain attacks, the industry has doubled down on provenance: tracking where packages come from, how they're built, and ensuring transparency. But provenance alone isn't enough.
Enter reproduce
, a new open-source tool designed to independently verify whether a published npm package can be faithfully rebuilt from its declared source. Unlike provenance systems that merely associate a package with a build environment (which can be ephemeral and manipulated), reproduce
goes a step further—empirically testing whether the package metadata actually corresponds to its purported source.
Why Provenance Falls Short
Efforts like Sigstore and GitHub's provenance verification for npm packages claim to enhance trust by tying packages to build environments. While this can provide some insights, it has significant limitations:
Ephemeral Trust: Provenance metadata, such as GitHub's npm provenance signatures, is often tied to CI/CD environments that have a 90-day TTL. If a bad actor has credentials, they can erase or manipulate this history, removing crucial verification data.
No Guarantees on Build Integrity/Origin: Just because a package is signed with provenance metadata doesn’t mean it was built deterministically from its stated source. Attackers can still insert malicious code before publishing and build environments are not locked down/network restrained. Even with the potential of "hermetic" build environments, a simple bit flipper/time-based value can change the artifact generated.
Lack of Historical Validation: Provenance mechanisms are forward-facing, focusing on net-new builds, but they don't tell us whether a package published years ago actually matched its repository’s state at the time.
Not more secure: Generally, there is no known supply chain attack vector that provenance solves for. In the recent
rspack
exploit many people pointed to provenance attestations as a "solution", but the "problem" there was malicious code in the package which was introduced via credential exploitation (also known as an "account takeover"). Provenance does not protect against account takeovers or malware.
reproduce
Changes the Game
How Instead of relying on metadata alone, reproduce verifies package integrity by attempting to rebuild npm packages from their linked source repositories.
Here's how it works:
- Fetches the package's source metadata from the npm registry (e.g., repository URL, commit hash).
- Clones the source repository at the exact commit linked to the package.
- Runs the package's build steps in a clean environment.
- Compares the resulting artifact with the actual published npm package.
If the outputs match, the package is considered reproducible—meaning it was built faithfully from its stated source. If not, there's a mismatch that raises questions about the package's integrity.
$ npx reproduce <package-name> # exit code 0 if reproducible, 1 if not
$ npx reproduce <package-name> --json # detailed metadata & strategies attempted/outcomes
Programmatic Usage
import reproduce from 'reproduce'
// Basic usage
const result = await reproduce('package-name')
// With custom configuration
const result = await reproduce('package-name', {
cache: {},
cacheDir: './custom-cache',
cacheFile: 'custom-cache.json',
})
The Data Speaks: Reproducibility in the npm Ecosystem
To put reproduce
to the test, we ran it against the top 5,000 "high impact" npm packages. "High Impact" packages are considered packages that either have >1 million weekly downloads or >500 dependents.
The results?
- 5.78% (289) of packages were found to be reproducible, meaning they could be rebuilt exactly as published without any changes to their source repositories or publishing workflows.
- After almost 2 years in the wild, only 3.72% of those same packages have added provenance attestations.
Reproducibility is already beating provenance & those numbers will only continue to improve. If the git information you have been publishing with your packages is accurate, it is highly likely we will eventually define a strategy which maps to how you built your package from source. Complex/bespoke build systems aside, the majority of packages are built using off-the-shelf package managers & build tools. I would not be surprised if the total reproducibility of the existing npm ecosystem is closer to 15-25%, if not greater.
Moving Beyond Provenance: The Future of Secure Package Infrastructure
Maintainers should always be concerned with the metadata they associate with their packages & taking time to clean that up & ensure it is accurate will help make your software reproducible.
At vlt
, we're actively working on new registry infrastructure and tooling to improve package security, including deeper integration of reproducibility checks. By shifting the focus from ephemeral provenance to verifiable reproducibility, we can create a more resilient JavaScript ecosystem—one where package trust is earned, not assumed.
reproduce
is just the beginning. If we want to secure the software supply chain, we need reproducibility by default. It's time to move beyond trusting metadata and start testing whether our tools and packages actually deliver what they promise.
Want to try reproduce
for yourself? Check out the project and see if your favorite npm packages pass the test.