-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs.read sends callback a buffer with more than what was read #3657
Comments
Thanks for opening this issue. AFAIK its simply a common misunderstanding of how to execute this code pattern using Node APIs and not an actual bug according to you:
How this is actually supposed to be done w/o running into the issue you describe was not immediately obvious to me, so thanks for the link! There are a couple of PRs open attempting to modify the documentation affecting this coding pattern (one of them was authored by you, another I reviewed and lost interest in pursuing further):
Do you think that maybe if we got those two merged and a third one to include the proper-use example you provided that it would help clear things up for users of Node.js still using callbacks for filesystem operations? I should probably mention that nearly all maintainers of Node.js (myself included) are unpaid volunteers doing this in their free time — please try to keep in mind that many of us lack incentive to work for free or do anything at all if not for the sake of wanting to help. |
The docs PR I authored (which won't be merged because one commit comment was a little over the rule and I don't know how to retroactively edit commit comments on GitHub) was about the distinction between This issue would require very clearly stating, in the context of describing the callback/return value of each affected function something like this:
Documentation changes like that should be retroactive for versions from the present all the way back to the introduction of the function. In my opinion, a line of code like the cited one should be added in the Node code just before the callback. I could make a PR which does that which could be merged into the next semver-major branch, but don't want it set up for rejection based on the concept, and I have not figured out how to do a PR in this project which can be accepted even if approved. I would also tend to defer to @mcollina on a docs update PR as he jumped in pretty quickly offering to do one and might have something in mind (or at least have a better idea what it should look like and how it can get added). Finally, I agree about the funding/motivation problem in open source, even for highly used projects. I'm formally a maintainer on WinstonJS (covering packages with >25M weekly NPM downloads), but available bandwidth to work on that is limited mostly to the particular purposes I was added for; project maintenance as a whole is a pretty large chunk of work. It surprises me that Node.js doesn't have funding or dedicated pro maintainers as it's the core of an ecosystem way larger than Winston's relatively tiny niche within it, and it has a funding channel set up through the OpenJS Foundation with pricey corporate memberships and big events. However, it doesn't surprise me that much. In any event, thanks to all the contributors for your work and contributions. My investment of all the time and effort associated with reporting on this after the much simpler step of adding the securing workaround to my own code is in a similar spirit, done out of a desire to help secure the software ecosystem more broadly. |
Brilliant thank you!!! Very curious to see if this has been fixed by now. I am working on it :) |
This is appears to be resolved, feel free to reopen/ping me if this is incorrect |
Version
17
Platform
Win10 Pro x64
Subsystem
fs
What steps will reproduce the bug?
Summary:
As documented, either fs.read() function takes as a final parameter a callback function which accepts three arguments:
(err, bytesRead, buffer)
. The buffer passed to this callback is of the same length as the buffer passed in, not limited to the number of bytes read.As a workaround, the first line of the callback can be set to:
const properlyLimitedBuffer = Buffer.from(buffer.buffer, buffer.byteOffset, bytesRead);
and then the rest of the callback function should use
properlyLimitedBuffer
instead ofbuffer
. This is fairly memory-efficient because Buffer.from() just creates a new view without copying the contents of the underlying memory. However, it appears that something like this line should be in the Node code just prior to calling the callback.Description:
Steps To Reproduce:
These steps are for use case nodejs/node#1 as described under “Impact” below.
copy.js
in a known location:inputs
andoutputs
. In inputs, create the following two files:Doug.txt:
Eve.txt:
node copy.js
.Note that the coding style in the example is meant to be compact for minimal illustration of this specific issue, not optimal for maintainability etc. (e.g. no error or large-file handling).
Possible fix:
Looking at the source code for Streams, it appears that
if (bytesRead !== buf.length) {
the code will “shrink to fit”by using Buffer.allocUnsafeSlow(bytesRead) followed by a copy operation. That looks like the correct behavior that should be copied over into fs.read, e.g. within the callback wrapper just prior to calling the callback. This strategy is slower and uses more memory than the Buffer.from() strategy above but is more resilient to a separate misuse where the same buffer is provided to asynchronous code running in parallel. An example of that case can be derived from the attached example by commenting out lines 15 + 27 (Promise constructor wrapper) and 22 (the most indented line).
For maintainability, instead of copying those lines of code as a potential fix, it would be better to split that off into a separate utility function (e.g.
shrinkBufferToFit
or evenBuffer.shrinkToFit(buffer, offset, bytesRead)
ifBuffer.from()
doesn’t cut it) which gets reused in multiple places.User Impact
Consider the following four use cases, all using CVSS v3.0’s Network attack vector with Node running as a Web server.
0) File Integrity Check
A user stores a file on the server. The server computes a hash of the file, using Crypto.hash.update(buffer) with each chunk of the file and then hash.digest() to compute a file hash. The documentation for
update()
indicates it can take a Buffer object but it does not appear to pay attention to or accept length (and possibly offset) parameter values.Even with one-chunk files, the result mismatches a hash of the same file computed elsewhere. This renders the file integrity check pretty useless, eliminating the security benefits one could get from the check. This occurs even without the programming-error-in-the-name-of-efficiency of reusing a buffer instead of initializing a new zero-filled buffer with Buffer.alloc(). This use case is how the issue was initially found.
1) Confidential Information Processing
In this use case, the server application supports the processing of confidential information for independent users. For this purpose, “processing” might mean something as simple as writing a file to a place where the user can access it later. The server processes a file which fills most of the buffer for Doug and then, reusing the same Buffer, processes a shorter file for Eve. Eve gains access to Doug’s confidential data, except for some initial portion obscured by data Eve provided. Eve could make this data intentionally very short to maximize exploit value, but even with zero technical knowledge or intent to gain unauthorized access, Eve is still gaining unauthorized access to some of Doug’s data.
In this scenario, Doug may have followed all modern best security practices in establishing the existence of his confidential information on the target system when becoming a user, perhaps years earlier (e.g. if the buffer was populated by some occasional or periodically run server-initiated process, example here) and Eve’s compromise of the data did not require any interaction from Doug. One could argue that in certain configurations of the unintentional compromise case, this doesn’t require any interaction from the attacker either.
2) Administrator Executable Privilege
In this use case, the server application allows an administrative user to upload executable code or commands which is/are later executed on the server. Attacker Alice, who has only the minimum level of public or user permissions required to make use of the buffer, fills most or all of the buffer or at least some piece at the end of the buffer with malicious code. Administrator Bob then fills the beginning of the buffer with a short executable that is saved to disk and/or subsequently executed. If the buffer is processed without the end limit at the write/execute stage, Alice’s unauthorized code executes after Bob’s authorized code.
Note that in this scenario, Alice does not strictly have to be intentionally attacking. However, if Alice is not intentionally attacking, the remaining contents of the buffer are likely to be junk that does not execute something interesting but rather throws an error. If the execution is done in the Node server’s process, it could cause the whole Node server process to exit, taking down the site and acting as an effective DoS for legitimate users. This can also be done if the data constitutes commands or input to commands for Node to execute, if the functions being executed make assumptions about when or how they will be called or assumptions about input which are violated by the junk or malicious data, and where the error causes the server process to exit. This seems much more common than the scenario where an application allows an administrator to upload near-arbitrary executable code saved and subsequently run on the system. (This DoS strategy has also been observed in a production application with less-than-stellar code quality.)
3) Software repository
In this use case, the server application allows an administrative user to upload executable code which is made available for user/public download (e.g. sites hosting useful software tools for government officials, medical/human subjects researchers, business users, distribution into the software supply chain etc.). The attack setup is similar to nodejs/node#2, but now the compromised code is running on users’ machines.
Regarding privileges needed to exploit this, note that Web applications could be configured to make publicly available functionality that alters the contents of a reused server-side Buffer. This was observed, for example, in an incorrect usage of the ‘express-fileupload’ module that accessed an underlying ArrayBuffer.
Most relevant first-page results in multiple searches suggested implementations using the whole buffer without any limit based on bytes read, or at least neglected to mention anything about the security practices that Node.js relies on developers to be using. Examples are here.
On the first page of results in Tabnine's search for code snippets using fs.read() I saw just johnsonj561 demonstrating the correct usage. Usage of the entire buffer or with explicit reference to buffer.length seemed to be most common, though a couple examples just checked a subset of initial bytes as needed for their use case, and a couple were local aliases for readFileSync.
Search results also included the official fs documentation which discusses how the callback gets a
Buffer
object that appears identical in description to what other functions say they take as input. While the official documentation can of course be enhanced to call this out, there's a lot of unofficial documentation not in our control which doesn't, it's not intuitive to have to include that extra step, and if everybody should be using that, why isn't it included in the Node code to support better security by default? Reuse of the buffer would still be supported as seen in the demo (with commented-out line active), unless someone is callingread()
from within the callback function with an increasing manually-setlength
value or after an end-of-file (insecure case) - with only the callback-parameter version of the buffer instead of the one passed in toread
.Search results also included a fair number of examples with
readFile()
variants which are a higher level, but which can create some unpredictable memory-based limits on the sizes of files that can be handled. This would seem to mean that conversions from use ofreadFile()
toread()
(the easiest-to-find chunk-based alternative which doesn’t have the memory usage constraint) might tend to be made under time pressure for fixing a bug that's actively blocking a real use case.Code impact
This is likely to impact similar other functions such as the multiple versions of filehandle.read and related synchronous/promisified functions, but without having done an in-depth analysis of the source code, I hesitantly assume good coding practices would have that handled already by other affected functions wrapping the affected code.
This was previously confidentially reported as a security issue, observing that CVE-2021-22939 was also considered a security issue even though a developer using a Node.js API differently could have solved it. The response from the Node.js team noted that like many of Node.js's low-level APIs, the fs.read() APIs is modelled after a UNIX API and Unix APIs are not seen as vulnerabilities but rather good models. The buffer is not trimmed in case a user is misusing a pattern of passing it through for reuse again. The Node.js security team thinks that an update to official documentation is all that's needed here, with @mcollina offering to open a PR for doing that.
How often does it reproduce? Is there a required condition?
Always.
What is the expected behavior?
The buffer provided to the callback from read() and its associated functions is limited to what was read.
What do you see instead?
The buffer contains extra information, which may include confidential information of another user or malicious code.
Depending on how the buffer is then used, this could be quite problematic.
Additional information
I still disagree that an update to official documentation is all that's needed and think that this should be fixed in Node code, on a semver-major release if necessary, but the discussion directed that next steps should be public discussion here on the issue tracker. I include the potential impact points above to help inform that discussion.
The text was updated successfully, but these errors were encountered: