Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

fs.readFile[Sync]("/dev/stdin") does not always read the entire file. #7412

Closed
mbostock opened this issue Apr 6, 2014 · 17 comments
Closed

Comments

@mbostock
Copy link

mbostock commented Apr 6, 2014

Given the following complete program, test-readFile-utf8:

#!/usr/bin/env node

var fs = require("fs");

fs.readFile("/dev/stdin", "utf8", function(error, contents) {
  console.log(contents.length);
});

And given an input file test.txt of 100,000 bytes, the following behaves as expected:

$ ./test-readFile-utf8 < test.txt
100000

However, if the test file is piped using cat, the program fails to read the entire file:

$ cat test.txt | ./test-readFile-utf8
65536

A complete set of test cases are available here: http://bl.ocks.org/mbostock/52950e1e53c19a56fad3

This problem can be avoided by using process.stdin’s data and end events:

#!/usr/bin/env node

var chunks = [];

process.stdin
    .on("data", function(chunk) { chunks.push(chunk); })
    .on("end", function() { console.log(chunks.join("").length); })
    .setEncoding("utf8");

However, it’s often convenient to have a single code path for reading input either from a file or from stdin — and it’s surprising that fs.readFile("/dev/stdin") (and likewise fs.readFileSync("/dev/stdin")) work sometimes but not other times — so it would be nice to make these methods work all the time as expected.

@mbostock
Copy link
Author

mbostock commented Apr 6, 2014

Also: I am on Mac OS X 10.8.5 and Node v0.10.26.

@tjfontaine
Copy link

The way readFile works is to first to stat() the file and then read that size, when you're redirecting a real file < somefile.txt stat() is reporting the proper size, but when it's a pipe you're getting the buffer size for pipes, and it's only reading up to that point.

There's really not a good way to work around that interface, as readFile is meant to deal with traditional files. The best way to read stdin regardless of the type of file is indeed to use process.stdin though it's probably slightly easier to .pipe() it to some other consumer.

If you were just looking to read an entire stream in a single programmatic line I have a little module that will do that, under the hood it's not too dissimilar to what you came up with, but is a bit more generic for any stream:

require('readtoend').readToEnd(process.stdin, function(err, data) {
  console.log(data);
});

https://www.npmjs.org/package/readtoend

@mbostock
Copy link
Author

mbostock commented Apr 6, 2014

I want to read some input, and I don’t care whether it’s from stdin or from a file. I just want to read it (or write it), and it’s a pain to special-case these things when my operating system already provides a (mostly) transparent representation for /dev/stdin and /dev/stdout.

Also, it seems possible for Node to do the “right” thing for these methods, regardless of the underlying reasons why these files are special. To illustrate this point, I’ve created a rw module that provides implementations of readFile, readFileSync, writeFile and writeFileSync that behave the way I would expect for stdin and stdout.

For example, readFileSync is implemented as:

module.exports = function(filename, options) {
  if (filename === "/dev/stdin") {
    var decoder = decode(options);

    while (true) {
      try {
        var buffer = new Buffer(bufferSize),
            bytesRead = fs.readSync(process.stdin.fd, buffer, 0, bufferSize);
      } catch (e) {
        if (e.code === 'EOF') break;
        throw e;
      }
      if (bytesRead === 0) break;
      decoder.push(buffer.slice(0, bytesRead));
    }

    return decoder.value();
  } else {
    return fs.readFileSync(filename, options);
  }
};

(The decoder is just a simple thing for dealing with strings vs. buffers.)

It’s fine if you want to push this concern to user-land, but given the complexity of doing so (as shown above), it seems to me like it would be possible to make these behave in a more reasonable way in Node and that would be a boon for users.

@chjj
Copy link

chjj commented Apr 6, 2014

I'm really confused as to why this issue was closed. When I see a function called readFile, I expect it to read a file. Especially in unix where every second thing is a file, you'd think this function would account for that.

Personally, I've always used a = fs.readFileSync('/dev/stdin', 'utf8') as a quick and dirty equivalent of a=$(cat) (or the messier: a=$(dd 2> /dev/null) / read -d $'\0' a) when I'm writing a small general purpose script. I'm surprised I never noticed this bug before, and I'm only now starting to wonder how many mixups this may have caused that I didn't notice.

readFile is node's cat (despite not being able to take multiple args and concatenate files). People expect it to read a file to the end, regardless of whatever kind of funky unix file it is. The current behavior is incorrect.

@chjj
Copy link

chjj commented Apr 6, 2014

Also, presumably this interferes with the fork+open+exec+read model of reading FDs via readFile'ing a /dev/fd/X. This completely ruins any possibility of doing kickass commandline things:

$ node -e 'console.log(require("fs").readFileSync("/dev/fd/50", "utf8"))' 50<<< 'hello world'

Okay, maybe the above is not the best example for obvious reasons, but imagine if that hadn't been just a "file" with a known size:

$ node read.js <(ncat -l -p 3000)

read.js:

console.log(require('fs').readFileSync(process.argv[2], 'utf8'));
$ ncat localhost 3000
Without readFile behaving properly, you might not
see how many cool things like this you can do.
^D

@mbostock
Copy link
Author

mbostock commented Apr 6, 2014

A good example of this confusion in the real-world is in UglifyJS, the popular JavaScript minifier:

@tjfontaine
Copy link

readFile is designed to be a very lo-fi interface, it could check in the stat() to see if it is a regular file and if not behave differently, I would review a pull request for that.

Regardless if many environments have /dev/stdin or not, it is certainly not a portable mechanism, whereas process.stdin is, though I would not consider accessing the .fd member to be good practice here. As demonstrated, it's reasonably trivial to implement "reading a whole file" from anything that implements the stream interface in an async fashion regardless of the type.

If what you really wanted was a synchronous mechanism for doing so the fs module exports all the functionality you need to implement that, but the usefulness of that mechanism in Node proper is suspect, and is arguably an anti-pattern.

@trevnorris
Copy link

You should try using fs.{read,write}{Sync}(). Not
fs.{read,write}File{Sync}(). The later operations technically depend on the
ability to seek() the fd. Which you can't do on a pipe.

@mbostock
Copy link
Author

mbostock commented Apr 7, 2014

Thank you continuing to entertain this discussion. It has been enlightening. :)

I do not consider reading synchronously to be an anti-pattern in this context: a command-line tool that takes a file as input and that needs to read the entire input before processing. There’s nothing else for the process to do while that file is loading, so forcing an asynchronous read merely adds complexity. I always assumed Node’s synchronous methods exist precisely for this type of convenience.

Command-line tools invariably need to read and write to files or stdin or stdout, so an API that can handle stdin & stdout interchangeably with regular files is preferred. It needn’t be as transparent as /dev/stdin and /dev/stdout, but the lack of a (supported) synchronous API to read from stdin essentially forces any command-line tool that wants to be a good citizen to read input asynchronously, despite the availability of fs.readFileSync.

If you really believe reading synchronously from stdin should not be supported, I would at least ask you make this more explicit in the documentation; otherwise fs.readFileSync("/dev/stdin") will lie in wait to bite those unaware. On the other hand if you think reading stdin synchronously is as reasonable as reading a file synchronously, then please advise on how best to implement this.

@sugang
Copy link

sugang commented Jun 11, 2014

I run into a similar issue as well. Eventually I used the byline module - I am trying to read a freshly uploaded file, even though it's marked done, sometimes not all lines were read - it's a pain that the API and the function of fs.readFile are not consistent

@timruffles
Copy link

Also got bitten by this. The docs give a false expectation that this should be possible:

I think the docs should be updated, or the function should validate its input.

@davidmccabe
Copy link

This issue is PHP-like in the absurd brokenness of the implementation, the difficulty of doing a simple thing correctly, and the recalcitrance of the maintainers in fixing or even documenting the problem. It's eerily reminiscent in its use of this three-pronged justification:

  1. The problem is acceptable because it stems from the implementation being a thin (or "lo-fi") wrapper around the underlying system calls.
  2. Your program can behave correctly: you must simply magically know about this problem and write a dozen lines of extra code to work around it.
  3. Programs written in this powerful, general-purpose language aren't generally thought of as doing this kind of thing; therefore it needn't be done correctly.

While fingers are stuck in ears and "la la la" is sung, more and more programs written in node are going to be flaky and buggy due to the unwillingness to fix bugs in node. You want a pull request? Fine. Acknowledge that this is an actual problem.

The culture of weirdly half-baked, brittle, burr-covered node modules starts here. Set a better example.

In particular, the use of stat to ascertain the size of a non-regular file is simply an implementation bug. See the POSIX standard here: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html

Quote:

off_t     st_size    For regular files, the file size in bytes. 
                     ...
                     For other file types, the use of this field is 
                     unspecified. 

There are several correct, well-documented ways to read a file in Unix systems, and you are not using one of them.

@stuartpb
Copy link

@davidmccabe 👍 and lucky thing we ended up getting nodejs/node#1074 out of it.

@mbostock
Copy link
Author

👍

@halfmatthalfcat
Copy link

I also just got blocked by this.

@appersonj
Copy link

The absence of a reliable file slurp (regardless of stdin or file source) weakens node.js for common systems-programming tasks. (Compare this to perl or bash, which make the operation trivial).

@stuartpb
Copy link

@appersonj What absence? You're reading a bug from the 0.x series; in modern mainline Node, this was fixed about a year ago with 5ecdc03.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants