Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream.unshift - TypeError: Argument must be a buffer #27192

Closed
marcosc90 opened this issue Apr 11, 2019 · 5 comments · Fixed by #27194
Closed

stream.unshift - TypeError: Argument must be a buffer #27192

marcosc90 opened this issue Apr 11, 2019 · 5 comments · Fixed by #27194
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@marcosc90
Copy link
Contributor

marcosc90 commented Apr 11, 2019

  • Version: v10.15.3
  • Platform: Ubuntu 18.04
  • Subsystem:

The stream.unshift documentation states:

chunk <Buffer> | <Uint8Array> | <string> | <any> Chunk of data to unshift onto the read queue. For streams not operating in object mode, chunk must be a string, Buffer or Uint8Array. For object mode streams, chunk may be any JavaScript value other than null.

The issue occurs when the stream encoding is not passed, or is not set using setEncoding. When that happens state.decoder is not set and fromList function may use state.buffer.concat (when state.buffer.length > 1) , which internally uses copyBuffer, that requires the source & target to be a Buffer or Uint8Array.

When stream.unshift is called with a string (which the documentation states that is a valid argument), in some cases where the buffer is filled, and state.buffer.concat is triggered, an error will be thrown:

buffer.js:636
    return _copy(this, target, targetStart, sourceStart, sourceEnd);
           ^

TypeError: argument must be a buffer

Here's the script to reproduce the error:

It's the parseHeader example from the documentation, but instead of converting remaining to a Buffer, I pass it directly to .unshift

class ArrayReader extends Readable {
  constructor(opt) {
    super(opt);
    const numbers = new Array(16384 * 2).fill(0).map(String);
    this.buffer = ['header', '\n\n', ...numbers];
  }
  _read(size) {

    while (this.buffer.length) {
      const chunk = this.buffer.shift();
      if (!this.buffer.length) {
        this.push(chunk);
        this.push(null);
        return true;
      }
      if (!this.push(chunk))
        return;
    }
  }
}

// Pull off a header delimited by \n\n
// use unshift() if we get too much
// Call the callback with (error, header, stream)
const { StringDecoder } = require('string_decoder');
function parseHeader(stream, callback) {
  stream.on('error', callback);
  stream.on('readable', onReadable);
  const decoder = new StringDecoder('utf8');
  let header = '';
  function onReadable() {
    let chunk;
    while (null !== (chunk = stream.read())) {
      const str = decoder.write(chunk);
      if (str.match(/\n\n/)) {
        const split = str.split(/\n\n/);
        header += split.shift();
        const remaining = split.join('\n\n');
        // const buf = Buffer.from(remaining, 'utf8');
        stream.removeListener('error', callback);
        stream.removeListener('readable', onReadable);
        if (remaining.length) 
          stream.unshift(remaining);

        // Now the body of the message can be read from the stream.
        callback(null, header, stream);
      } else {
        // still reading the header.
        header += str;
      }
    }
  }
}
parseHeader(new ArrayReader(), (err, header, stream) => {
  stream.once('data', chunk => {
    // console.log(stream._readableState.buffer);
  });
});

This can be fixed, either by converting the chunk to Buffer similar to how it's done in .push, and maybe adding an encoding argument too. Or by changing the documentation to state that a string can only be passed if the stream encoding is set.

Readable.prototype.unshift = function(chunk, encoding) {

  const state = this._readableState;
  let skipChunkCheck = false;
  // to avoid repetition between `.push` & `.unshift` this can be check inside readableAddChunk
  if (!state.objectMode) {
    if (typeof chunk === 'string') {
      encoding = encoding || state.defaultEncoding;
      if (encoding !== state.encoding) {
        chunk = Buffer.from(chunk, encoding);
        encoding = '';
      }
      skipChunkCheck = true;
    }
  } else {
    skipChunkCheck = true;
  }

  return readableAddChunk(this, chunk, encoding, true, skipChunkCheck);
};
@addaleax
Copy link
Member

@marcosc90 Would you be interested in opening a PR, since it seems that you already have a solution available?

@addaleax addaleax added the stream Issues and PRs related to the stream subsystem. label Apr 11, 2019
@marcosc90
Copy link
Contributor Author

Yes, I have the solution ready, I can submit the PR in a couple of hours once I finish working.

Just to confirm, a new argument must be added to unshift, the documentation change should be added in the same commit?

@addaleax
Copy link
Member

@marcosc90 I’m not sure about the new argument – if the expectation is that the data was previously read from the stream, the encoding should match the encoding used for reading data from the stream (i.e. state.encoding, rather than state.defaultEncoding – the naming clash is unfortunate here).

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 11, 2019

state.encoding will be null, in all cases where this bug occurs. I can do Buffer.from(chunk, state.encoding) but that will always default to utf8. So unless you set the encoding beforehand, where the error does not occur, if you .unshift a string, that string will always be treated as an utf8 one.

Is this the behaviour that you want? I believe if .unshift needs to support strings, an encoding argument is recommended.

In any case, I can submit the PR which whatever functionality you prefer.

I'm working on a few snippets to see if not adding encoding is going to be problematic. (I know that in most cases when working with strings, utf8 is going to be the encoding)

@marcosc90
Copy link
Contributor Author

marcosc90 commented Apr 11, 2019

Take the following example, using hex as encoding instead of utf8

function parseHeader(stream, callback) {
  stream.on('error', callback);
  stream.on('readable', onReadable);
  const decoder = new StringDecoder('hex');
  let header = '';
  function onReadable() {
    let chunk;
    while (null !== (chunk = stream.read())) {
      const str = decoder.write(chunk);
      if (str.match(/\n\n/)) {
        const split = str.split(/0a0a/);
        header += split.shift();
        const remaining = split.join('0a0a');
        stream.removeListener('error', callback);
        stream.removeListener('readable', onReadable);
        if (remaining.length) 
          stream.unshift(remaining);

        // Now the body of the message can be read from the stream.
        callback(null, header, stream);
      } else {
        // still reading the header.
        header += str;
      }
    }
  }
}
parseHeader(new ArrayReader(), (err, header, stream) => {
  stream.once('data', chunk => {
     console.log(chunk.toString('utf8'))
  });
});

That will output: 3030303030303030303030303030303030303030

Instead of: 00000000000000000000, since the pushed chunk has the wrong encoding, and there isn't a way to set it.

With the fix I'm proposing, using: stream.unshift(remaining, 'hex') I get the correct output: 00000000000000000000

So right now every string unshifted, is coerced to utf8.

targos pushed a commit that referenced this issue Jun 3, 2019
`readable.unshift` can take a string as an argument, but that
string wasn't being converted to a Buffer, which caused a
<TypeError: Argument must be a buffer> in some cases. Also if a
string was passed, that string was coerced to utf8 encoding.

A second optional argument `encoding` was added to `unshift` to
fix the encoding issue.

Fixes: #27192
PR-URL: #27194
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants