Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder: make write after end to reset #16594

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions doc/api/string_decoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,16 @@ substitution characters appropriate for the character encoding.
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
is performed before returning the remaining input.

### stringDecoder.reset([encoding])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is an implementation detail users don't have to know about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is, there is no way for the users to reset the string decoder, right? They have to create a new instance if needed. This will enable reusability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new way to reset the state each time was not requested as far as I read the issue. So I also prefer not to expose the reset function. If I am correct the following should work.

const { StringDecoder } = require('string_decoder');
const decoder = new StringDecoder('utf8');

decoder.write(Buffer.from([0xE2, 0x82])); // => ''
decoder.end(); // => '�'
decoder.write(Buffer.of(0x61)); // => 'a'

<!-- YAML
added: REPLACEME
-->

* `encoding` {string} The character encoding the `StringDecoder` will use.
Defaults to the current `encoding` of the decoder.

Flushes all the internal data and the decoder will behave like a new object.

### stringDecoder.write(buffer)
<!-- YAML
added: v0.1.99
Expand All @@ -82,3 +92,4 @@ Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer` are omitted from the returned string and stored in an
internal buffer for the next call to `stringDecoder.write()` or
`stringDecoder.end()`.

27 changes: 22 additions & 5 deletions lib/string_decoder.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,52 @@ function normalizeEncoding(enc) {
// characters.
exports.StringDecoder = StringDecoder;
function StringDecoder(encoding) {
if (encoding === this.encoding && encoding !== undefined) {
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(this._nb || 0);
this._closed = false;
return;
}

this.encoding = normalizeEncoding(encoding);
var nb;
switch (this.encoding) {
case 'utf16le':
this.text = utf16Text;
this.end = utf16End;
nb = 4;
this._nb = 4;
break;
case 'utf8':
this.fillLast = utf8FillLast;
nb = 4;
this._nb = 4;
break;
case 'base64':
this.text = base64Text;
this.end = base64End;
nb = 3;
this._nb = 3;
break;
default:
this.write = simpleWrite;
this.end = simpleEnd;
this._nb = 0;
this._closed = false;
return;
}
this.lastNeed = 0;
this.lastTotal = 0;
this.lastChar = Buffer.allocUnsafe(nb);
this.lastChar = Buffer.allocUnsafe(this._nb);
this._closed = false;
}

StringDecoder.prototype.reset = StringDecoder;

StringDecoder.prototype.write = function(buf) {
if (this._closed === true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== true is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we explicitly compare strictly against the value, the performance would be better right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with TurboFan, no. I asked about this in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, I think this is the previous discussion in question: #16397 (comment)

this.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic be in StringDecoder.prototype.end instead? Make the per-encoding this.end switching in the constructor actually set to an internal symbol-named property, and use a shared StringDecoder.prototype.end that calls that internal encoding-specific function and clean up after itself. Would probably help with write performance too if StringDecoders of different encodings (e.g. UTF-8 which does not have an own property end and UTF-16LE that does) are used simultaneously.


if (buf.length === 0)
return '';

var r;
var i;
if (this.lastNeed) {
Expand Down Expand Up @@ -210,6 +226,7 @@ function utf8Text(buf, i) {
// character.
function utf8End(buf) {
const r = (buf && buf.length ? this.write(buf) : '');
this._closed = true;
if (this.lastNeed)
return r + '\ufffd';
return r;
Expand Down
30 changes: 30 additions & 0 deletions test/parallel/test-string-decoder-end.js
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,33 @@ function testBuf(encoding, buf) {
assert.strictEqual(res1, res3, 'one byte at a time should match toString');
assert.strictEqual(res2, res3, 'all bytes at once should match toString');
}

{
// test to check if the write after end doesn't accumulate the data
const decoder = new SD('utf8');
const euroPart1 = Buffer.from([0xE2]);
const euroPart2 = Buffer.from([0x82, 0xAC]);
decoder.end(euroPart1);
const result = decoder.write(euroPart2);
assert.notStrictEqual(result, '€');
}

{
// test to check if write after end reopens the decoder
const decoder = new SD();
assert.strictEqual(decoder._closed, false);
decoder.end();
assert.strictEqual(decoder._closed, true);
decoder.write(Buffer.from([0xE2]));
assert.strictEqual(decoder._closed, false);
}

{
// test to check if reset after end reopens the decoder
const decoder = new SD();
assert.strictEqual(decoder._closed, false);
decoder.end();
assert.strictEqual(decoder._closed, true);
decoder.reset();
assert.strictEqual(decoder._closed, false);
}