doc: document switchover from measured bytes to chars after setEncoding #13442

jalafel · 2017-06-03T20:15:46Z

Checklist

This is documentation PR to denote the change of measurement against the highWaterMark in non-object streams after .setEncoding() is called. It changes from bytes to character.

documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

doc

vsemozhetbyt · 2017-06-03T20:25:28Z

doc/api/stream.md

@@ -66,7 +66,7 @@ buffer that can be retrieved using `writable._writableState.getBuffer()` or

 The amount of data potentially buffered depends on the `highWaterMark` option
 passed into the streams constructor. For normal streams, the `highWaterMark`
-option specifies a total number of bytes. For streams operating in object mode,
+option specifies a [total number of bytes][hwm-gotcha]. For streams operating in object mode,


It seems this line exceeds 80 characters limit.

vsemozhetbyt · 2017-06-03T20:26:49Z

doc/api/stream.md

@@ -101,6 +101,9 @@ Because data may be written to the socket at a faster or slower rate than data
 is received, it is important for each side to operate (and buffer) independently
 of the other.

+
+
+


Maybe these empty lines are unintentional?

mscdex · 2017-06-03T21:37:48Z

doc/api/stream.md

@@ -1965,6 +1968,19 @@ has an interesting side effect. Because it *is* a call to
 However, because the argument is an empty string, no data is added to the
 readable buffer so there is nothing for a user to consume.

+### `highWaterMark` discrepency after calling `readable.setEncoding()`
+
+The use of `readable.setEncoding()` will change the behaviour of how the


I think we are unofficially targeting US English, so s/behaviour/behavior.

mscdex · 2017-06-03T21:37:58Z

doc/api/stream.md

+comparison function will begin to measure the buffer's size in characters.
+
+This is not a problem in common cases with `utf8` or `ascii`. But it is
+advised to be mindful about this behaviour when working with long strings 


Ditto here.

mscdex · 2017-06-03T21:41:47Z

doc/api/stream.md

+The use of `readable.setEncoding()` will change the behaviour of how the
+`highWaterMark` operates in non-object mode.
+
+Atypically, the size of the current buffer is measured against the 


I think 'atypically' here should be 'typically'?

mscdex · 2017-06-03T21:44:06Z

Also, the first line of the commit message is too long.

tniessen · 2017-08-16T10:55:55Z

@jessicaquynh This needs to be rebased, sorry for the delay.
@mscdex @vsemozhetbyt Could you PTAL?

BridgeAR · 2017-08-28T00:01:11Z

Ping @jessicaquynh this needs a rebase

jalafel · 2017-08-28T04:26:30Z

@BridgeAR Sorry! Didn't realize that was on me. Thank you!

mscdex · 2017-08-28T04:32:30Z

doc/api/stream.md

-    the internal buffer before ceasing to read from the underlying
-    resource. Defaults to `16384` (16kb), or `16` for `objectMode` streams
-  * `encoding` {string} If specified, then buffers will be decoded to
+  * `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store 


I think the casing for the parameter type should stay the same.

mscdex · 2017-08-28T04:32:36Z

doc/api/stream.md

+  * `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store 
+    in the internal buffer before ceasing to read from the underlying resource. 
+    Defaults to `16384` (16kb), or `16` for `objectMode` streams
+  * `encoding` {String} If specified, then buffers will be decoded to


mscdex · 2017-08-28T04:33:48Z

doc/api/stream.md

+
+Typically, the size of the current buffer is measured against the 
+`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
+comparison function will begin to measure the buffer's size in characters.


Perhaps italicize the word 'characters' here as well since it is done for the word 'bytes' in the previous sentence.

mscdex · 2017-08-28T04:36:30Z

doc/api/stream.md

+`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
+comparison function will begin to measure the buffer's size in characters.
+
+This is not a problem in common cases with `utf8` or `ascii`. But it is


This sentence seems a little confusing to me since we're comparing a multi-byte encoding and a single-byte encoding here. The way the sentence reads to me is that 'ascii' and 'utf8' are common cases and I do not need to worry about them, which is not true for multi-byte utf8 characters. If we're going to name some example encodings, you might replace 'utf8' with 'latin1' instead.

mscdex · 2017-08-28T04:37:28Z

doc/api/stream.md

+
+This is not a problem in common cases with `utf8` or `ascii`. But it is
+advised to be mindful about this behavior when working with long strings 
+with special characters.


I think it might be better to be more specific about what 'special' means, which is 'multi-byte'.

Also, the length of the string has no bearing on this particular problem. You could have a single character that spans 4 bytes for utf8 for example.

BridgeAR

LGTM with the comment addressed.

BridgeAR · 2017-08-30T19:08:13Z

doc/api/stream.md

+
+This is not a problem in common cases with `latin1` or `ascii`. But it is
+advised to be mindful about this behavior when working strings that contain
+multi-byte characters.


The last sentence does not seem to be complete. I guess there is a with missing after "working". I would also add a could after "that" as not every string would likely be an issue.

This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue nodejs#6798 that after calling .setEncoding() on the stream, it will thereafter begin to measure the buffer's length in characters.

BridgeAR · 2017-09-08T02:23:35Z

Landed in 89f2074

@jessicaquynh thanks a lot for your contribution

This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Vse Mozhet Byt <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: nodejs#13442 Refs: nodejs#6798 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Vse Mozhet Byt <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Vse Mozhet Byt <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

nodejs-github-bot added doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem. labels Jun 3, 2017

vsemozhetbyt reviewed Jun 3, 2017

View reviewed changes

mscdex reviewed Jun 3, 2017

View reviewed changes

jalafel force-pushed the doc-#6798 branch 3 times, most recently from 0231823 to 9205f3f Compare June 4, 2017 19:24

jasnell approved these changes Jun 4, 2017

View reviewed changes

vsemozhetbyt approved these changes Aug 16, 2017

View reviewed changes

jalafel force-pushed the doc-#6798 branch from d43c809 to 111f17a Compare August 28, 2017 04:24

mscdex reviewed Aug 28, 2017

View reviewed changes

jalafel force-pushed the doc-#6798 branch 2 times, most recently from 52246a0 to d0094d9 Compare August 29, 2017 15:10

BridgeAR approved these changes Aug 30, 2017

View reviewed changes

jalafel force-pushed the doc-#6798 branch from d0094d9 to 6e8ecc2 Compare August 31, 2017 00:31

BridgeAR approved these changes Sep 8, 2017

View reviewed changes

BridgeAR closed this Sep 8, 2017

MylesBorins mentioned this pull request Sep 10, 2017

v8.5.0 proposal #15308

Merged

MylesBorins added the land-on-v6.x label Sep 20, 2017

MylesBorins mentioned this pull request Sep 20, 2017

v6.11.4 proposal #15506

Merged

gireeshpunathil mentioned this pull request Apr 9, 2018

streams: Readable highWaterMark is measured in bytes *or* characters #6798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: document switchover from measured bytes to chars after setEncoding #13442

doc: document switchover from measured bytes to chars after setEncoding #13442

jalafel commented Jun 3, 2017 •

edited

Loading

vsemozhetbyt Jun 3, 2017

vsemozhetbyt Jun 3, 2017

mscdex Jun 3, 2017

mscdex Jun 3, 2017

mscdex Jun 3, 2017

mscdex Jun 3, 2017

mscdex commented Jun 3, 2017 •

edited

Loading

tniessen commented Aug 16, 2017

BridgeAR commented Aug 28, 2017

jalafel commented Aug 28, 2017

mscdex Aug 28, 2017

mscdex Aug 28, 2017

mscdex Aug 28, 2017

mscdex Aug 28, 2017

mscdex Aug 28, 2017

BridgeAR left a comment

BridgeAR Aug 30, 2017

BridgeAR commented Sep 8, 2017

		@@ -101,6 +101,9 @@ Because data may be written to the socket at a faster or slower rate than data
		is received, it is important for each side to operate (and buffer) independently
		of the other.

doc: document switchover from measured bytes to chars after setEncoding #13442

doc: document switchover from measured bytes to chars after setEncoding #13442

Conversation

jalafel commented Jun 3, 2017 • edited Loading

Checklist

Affected core subsystem(s)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mscdex commented Jun 3, 2017 • edited Loading

tniessen commented Aug 16, 2017

BridgeAR commented Aug 28, 2017

jalafel commented Aug 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BridgeAR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BridgeAR commented Sep 8, 2017

jalafel commented Jun 3, 2017 •

edited

Loading

mscdex commented Jun 3, 2017 •

edited

Loading