Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

fs.readdir callback is an array which is inefficient to process for large directories #388

Closed
esprehn opened this issue Oct 31, 2010 · 14 comments

Comments

@esprehn
Copy link

esprehn commented Oct 31, 2010

fs.readdir is very problematic for listing directories if you don't know how large they are in advance. If you used a for loop to iterate over the array to output a directory listing (ex. directory index) and you list a directory with 1000 entries the whole process is blocked.

A DirectoryIterator should be provided instead that returns the directory list sorted and emits events like when reading lines from a file. It's easy to DOS a node HTTP server that implements directory listings by just requesting the listing of a large directory over and over again right now.

@ry
Copy link

ry commented Nov 1, 2010

I agree.

@sh1mmer
Copy link

sh1mmer commented Oct 20, 2011

Any thoughts about how we do this without breaking the existing API?

@Mithgol
Copy link

Mithgol commented Nov 27, 2011

No one says you have to iterate over the array in a for loop.

Whenever you anticipate a large array, use a closure function to call in .nextTick(…), such as the following:

var fs = require('fs');
var clog = console.log;
var dir2read = '.';

fs.readdir(dir2read, function(err, flist){
   if (err) {
      clog('Error reading directory ' + dir2read);
      clog(err);
      return;
   }
   var elemNum = 0;
   var ProcessEntry = function(entry){
      clog('Processing a directory entry: ' + entry);
   }
   var DirIterator = function(){
      ProcessEntry(flist[elemNum]);
      elemNum++;
      if (elemNum < flist.length) process.nextTick(DirIterator);
   }
   if (elemNum < flist.length) process.nextTick(DirIterator);
   clog('.readdir() callback is finished, event loop continues happily...');
});

That is not an fs.readdir problem anyway. It happens with any sufficiently large array. If you need to fix it, do not fix fs.readdir: what you need is rather an async array-processing module.

@Mithgol
Copy link

Mithgol commented Nov 27, 2011

An async array-processing module is as simple as the following function:

var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
   var elemNum = 0;
   var arrLen = inArray.length;
   var ArrayIterator = function(){
      inEntryProcessingFunction(inArray[elemNum]);
      elemNum++;
      if (elemNum < arrLen) process.nextTick(ArrayIterator);
   }
   if (elemNum < arrLen) process.nextTick(ArrayIterator);
}

With this function, the above example of iterating over an array of directory entries takes the following form:

var fs = require('fs');
var clog = console.log;
var dir2read = '.';

var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
   var elemNum = 0;
   var arrLen = inArray.length;
   var ArrayIterator = function(){
      inEntryProcessingFunction(inArray[elemNum]);
      elemNum++;
      if (elemNum < arrLen) process.nextTick(ArrayIterator);
   }
   if (elemNum < arrLen) process.nextTick(ArrayIterator);
}

fs.readdir(dir2read, function(err, flist){
   if (err) {
      clog('Error reading directory ' + dir2read);
      clog(err);
      return;
   }
   var ProcessDirectoryEntry = function(entry){
      // This may be as complex as you may fit in a single event loop
      clog('Processing a directory entry: ' + entry);
   }
   AsyncArrayProcessor(flist, ProcessDirectoryEntry);
   clog('.readdir() callback is finished, event loop continues...');
});

As you may see now, there is nothing wrong with fs.readdir. It's just a matter of array processing.

I suggest this issue to be closed.

@indutny
Copy link
Member

indutny commented Nov 27, 2011

@sh1mmer I think fs.readdir('dirname') (w/o callback) may return eventemitter, that'll emit directory and end event

@cbiffle
Copy link

cbiffle commented Aug 3, 2012

@Mithgol, I think you may have misinterpreted the problem.

It isn't that it's an array --- it's that it's returned all at once instead of streamed.

Compare to the function this is named after, POSIX readdir(3). To list the contents of a directory in POSIX, it takes a sequence of calls:

  1. opendir with the desired path.
  2. Repeated calls to readdir to step through the directory.
  3. closedir

This means memory contains only one directory entry at a time.

With node's current fs.readdir API, the entire directory listing gets inflated into memory.

@indutny, @sh1mmer - an event emitter makes sense, but an internal iterator would also work. Ignore the specific names, here's the pattern:

fs.eachChildOf("/my/dir/path", function (childPath) { ... });

FWIW, I've seen this bug become an actual problem in two cases: traversing a large git object store and reading a maildir (in @clee's crankshaft mail client).

@clee
Copy link

clee commented Aug 3, 2012

+1 to @cbiffle's reasoning.

@Mithgol
Copy link

Mithgol commented Aug 7, 2012

Try defining fs.eachChildOf as the following:

var fs = require('fs');

var AsyncArrayProcessor = function (inArray, inEntryProcessingFunction) {
   var elemNum = 0;
   var arrLen = inArray.length;
   var ArrayIterator = function(){
      inEntryProcessingFunction(null, inArray[elemNum]);
      elemNum++;
      if (elemNum < arrLen) setTimeout(ArrayIterator, 0);
   }
   if (elemNum < arrLen) setTimeout(ArrayIterator, 0);
}

fs.eachChildOf = function (dirPath, entryFunction) {
   /*
   entryFunction(err, directoryEntry)

   called as entryFunction(err) form is there's an error

   otherwise called as entryFunction(null, directoryEntry)
   once for each of the directory entries
   asynchronously
   */
   fs.readdir(dirPath, function(err, flist){
      if (err) {
         entryFunction(err);
         return;
      }
      AsyncArrayProcessor(flist, entryFunction);
   }
}

Should work as desired… unless fs.readdir() itself takes too much time.

@clee
Copy link

clee commented Aug 7, 2012

@Mithgol: That's exactly the problem. When you have a few thousand directory entries, fs.readdir() takes too long. (And also spikes your memory usage.)

@tjfontaine
Copy link

see also joyent/libuv#1521

@jasnell
Copy link
Member

jasnell commented May 19, 2015

@indutny ... did this ever land anywhere?

@Fishrock123
Copy link

I don't think so, but there is nodejs/node#583, which is probably the better route. (Could be fed into a generator for iteration.)

@jasnell
Copy link
Member

jasnell commented May 19, 2015

Agree... just not sure if this is something we should address in v0.12+ or push to the converged stream. Regardless, this looks like a feature-request as opposed to a bug. Will leave it open to track.

@jasnell
Copy link
Member

jasnell commented Jun 22, 2015

Actually, on second thought. Let's close this. It's not likely something that would land here and the discussion going on in nodejs/node#583 is a better direction. Closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants