-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an affected function format specification for npm (JS) advisories #66
Comments
Be aware that in a package without the exports field, every file is accessible, and so you may indeed need to account for most of the language. |
Also, there is not one single “root” - packages have 0, 1, or N entry points, and all must be considered. |
Do you have a link to such an example package? |
https://npmjs.com/es-abstract has thousands, as an example of one of my own packages. |
Many packages would be like this one with 'main' being used instead of 'exports' - https://github.com/nodejs/node-addon-api/blob/main/package.json. In terms of the proposed function format, it seems reasonable, but I think planning to have multiple paths to the same function to which the advisory applies (assuming multiple exports or assuming all files are accessible) would be necessary. I assume that |
@taladrane / @darakian thanks for sharing your time & this spec with the group in today's call. I'll try to reiterate some of the feedback I had which is atop of 1. On Requirements
2. On the Proposed Grammar
I'm not sure what Example project# github.com/<owner>/lodash/
├── program.js (A) -> `const _ = require('lodash'); _.at(); // vulnerable call...`
├── package.json (B) -> { "name": "lodash" ... }
└── node_modules
└── lodash
└── package.json (C) -> { "name": "lodash" ... }
# registry.npmjs.com/lodash/<version> (D) -> { "name": "lodash" ... } In the example above, the specifier I think its probably best to remove the If you really need to have the name in the schema, it's probably best to be explicit & reference either an integrity hash (similar to how commits are treated & referenced with git - ex. Example alternative grammarsFor a vulnerable function call of
3. On Purpose/Goal
To my knowledge, there is no such thing as This is a long way of saying, you cannot avoid "defining a universal specification for all of I'm sure there's other considerations to this work (ex. looking into how stack graphs are made & seeing the current state of tree-sitter/tree-sitter-javascript makes me a bit nervous about the accuracy of the call stack) but I'll leave it at this. |
I think all the issues here boil down to: relying on the advisory reporting side to build an exhaustive list of the many paths at which a function can be imported from an npm package is a near impossible to solve task. I think we need three things to reliably identify where an advisory points:
For number 1, we need to account for alternative registries, aliasing, and a few other complexities. For number 2, I just can't think of any way we can get a canonical definition of where it is exported in a maintainable way without the filepath. But if we think we need that, then we need to enumerate all the the import locations, which I think is so similar to just using the filepath relative to the package root that I fail to see the difference. Number 3 should be fairly straight forward, but if we want to account for things like dynamic exports with CJS then it might get rather complicated. |
To clarify on that point I had been (implicitly 🤦) using it to refer to the namespace owned on But you both bring up a a good point about supporting other package registries as well. A simple extension to that could be something like @darcyclarke I'm not sure I agree on this point.
While that would be an improvement in technical accuracy we need to ensure that humans can provide this data and asking humans to understand and edit call graphs is a tall ask. I will be one of these humans I do not feel like I'm well equipped to handle that task. Perhaps a call graph for a specific version is some companion data which could be constructed and attached to the advisory payload, but we need a human viable route to manage this data. re:
Our advisory data covers ranges of vulnerable packages and Would it make sense to re-frame the question from |
For a package identifier, a PURL probably makes sense. |
You mean for handling multiple package registries? If so, purl only has an npm purl type (for js stuff) at the moment, so that would need to be expanded on a per-registry basis as desired. Stepping back from package identification though; function data is subordinate data on an advisory which would identify the package itself. |
Any reference to call graphs is probably erroneous. Call graphs don't relate to the names of the callable items. Access paths certainly are relevant, of course. |
Hey all, popping back in to ask another question or two 😅 Within the context of npm modules do you find Minimal example
If Foo were a top level class of an npm module baz we would have paths |
Yes.
We still need to establish that |
Not sure I follow you on this point. I attempted to clarify what I meant by npm package here. Could you expand on how you disagree with that package naming? |
Sorry, maybe we were all talking past eachother a bit. @darcyclarke addressed that in his comment above:
to which you responded:
I believe that the registry url probably has to be included, but I would go revisit the things said in Darcy's comment under the "proposed grammar" section. |
Ah I see what you're getting at. So, I disagree with the registry url being required for our use case where our advisory payload will include an affected product The function syntax I'm trying to get feedback on would be subordinate data to an affected product and the So to expand on my comment here |
So what happens when a project hosted on Github is loading packages form a different registry? Or has a git based package spec? Or one of the other things not covered by the import identifier? Do y'all just not report on it? EDIT: yeah I guess that is the case. Maybe we were all confused on your scope. |
Is github packages not considered part of the npm ecosystem by github? |
Correct. Only the packages on |
Ok, since things have died down a bit here can we get some feedback on the following? We've come up with what we think is a workable format to move forward with. The general syntax for function names is described as:
Broadly speaking, the name consists of two parts. The part in parentheses identifies a particular JavaScript file. The part after the parentheses identifies a definition within that file. For our part, we are only concerned about definitions that are exported from a published npm package. Therefore, the entry_specifier will always identify an npm package, and the path.to.exported_function will identify one of the definitions exported from the package’s main file:
The basic Extended Backus–Naur (EBNF) form can be represented as:
It is very common in JavaScript packages for functions to be defined in “nested” or “inner” files, instead of being defined directly in the package’s main file. The package then uses a series of “re-exports” to make those definitions available to downstream users of the package. Definitions that follow this pattern could conceivably have more than one name: the names that are used internal to the package to identify the actual file that holds the definition, and whatever name it has within that file; and also the name that is actually exposed to downstream users in the package’s main file. For GitHub's part, we are only concerned with the final exported names, since that is how downstream users will have access to the definition and these are what will live on our advisories. To exapand on the individual components of
In order to differentiate static methods from instance methods we also propose a dual-level namespace with .[static-method] and also, separately prototype.[any-instance-methods] so that if a class has both a static and instance method with the same name they may be differentiated. A few examples: For deeper calls into the package will append the function path rooted at the entry point. |
Again, an npm package has N - read, infinite - file paths that can lead into it. In other words, In other words, your format is profoundly unworkable and inapplicable to the npm JS ecosystem, and there's tons of feedback on this very thread explaining why that's the case. |
I am wondering if it would be helpful if we flipped this around? Is there a format we could propose which would uniquely identify it? I would need to re-read the whole thread (which I don't have time for right now) but did we go over the pros/cons of using a relative file path from the package root? Wouldn't that solve this in that we would then just resolve the module with node's module resolution algo to see if the import did in fact map to the impacted file? |
The primary pro from our point of view is that when reading someone's source code there are import statements and paths from the import that can be read. We think that producing paths of the sort proposed here could aid scanning tools which are trying to answer the question "is this codebase affected by vuln X".
I think in my mind each of those
I have tried to reply to the feedback as best I can. |
@darakian yes but the version is tied to the |
Would it suffice to allow the entry point to contain a path? Eg. The version information would be contained in the advisory payload to which this function information would belong. |
As mentioned, that still wouldn’t work for ESM, which has 0 to N export names that a value could be available under. |
Sorry for the late reply. For the moment 0 export names is out of scope for us. Maybe that's something we can address in a second version of the spec. For the 1 to N export names I think the spec we're proposing can capture them. We're not proposing a system to de-duplicate reexports but only to name exports. In the context of a security advisory if a vulnerable piece of code is accessible via two or more paths we would list all relevant paths. |
Hi OpenJS friends 👋 we'll be joining your Security WG meeting this upcoming Monday, October 2, to discuss the below idea, but please feel free to comment with feedback at any point! We're hoping to be able to finalize the discussion by October 16th 🙇♀
Background
GitHub released a beta feature last year that allows Dependabot alerts to surface calls to vulnerable functions to end users. The goal of this feature is to make prioritization and response to vulnerability alerts easier by sharing with users both if they’re actually using the snippet of affected code in their project and, if so, where they’re calling the vulnerable code. This beta feature is currently released for the pip/Python ecosystem, and we’re exploring adding support for the npm/JavaScript ecosystem. We’re engaging the community for feedback on this proposal to ensure what we’re doing is sound as it will ideally end up impacting this community positively.
One of the things we’ve discovered along the way is the need for a standardized format for the affected function names we use to refer to symbols that appear in npm (JS) advisory source code. The names given are included in metadata for the published advisory and reflected to the end user in their Dependabot alert. For our purposes, we are only considering exported functions in an npm package, so we’re hoping to avoid the complexity of defining a universal specification for all of JavaScript. The function name should be intuitive, easily parsable, and the elements of the name will be valid objects in a JS program.
Proposed affected function name requirements:
Proposal
(npm_package).foo.bar.biz.baz
where
foo.bar.biz.baz
is a walk of a public path from the root of the package. Having (npm package) called out explicitly in the function name itself rather than being inferred from affected product data gives the benefit of capturing a root level anonymous function, which are quite common in the npm world. Encapsulating the package name in(
)
is done to aid machine parsability. This also happens to match what a user would write to use a single function package and how it would get laid out in documentation for users.The text was updated successfully, but these errors were encountered: