Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: document switchover from measured bytes to chars after setEncoding #13442

Closed
wants to merge 1 commit into from

Conversation

jalafel
Copy link
Contributor

@jalafel jalafel commented Jun 3, 2017

Checklist

Ref: #6798

This is documentation PR to denote the change of measurement against the highWaterMark in non-object streams after .setEncoding() is called. It changes from bytes to character.

Affected core subsystem(s)

doc

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem. labels Jun 3, 2017
@@ -66,7 +66,7 @@ buffer that can be retrieved using `writable._writableState.getBuffer()` or

The amount of data potentially buffered depends on the `highWaterMark` option
passed into the streams constructor. For normal streams, the `highWaterMark`
option specifies a total number of bytes. For streams operating in object mode,
option specifies a [total number of bytes][hwm-gotcha]. For streams operating in object mode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this line exceeds 80 characters limit.

@@ -101,6 +101,9 @@ Because data may be written to the socket at a faster or slower rate than data
is received, it is important for each side to operate (and buffer) independently
of the other.




Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these empty lines are unintentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -1965,6 +1968,19 @@ has an interesting side effect. Because it *is* a call to
However, because the argument is an empty string, no data is added to the
readable buffer so there is nothing for a user to consume.

### `highWaterMark` discrepency after calling `readable.setEncoding()`

The use of `readable.setEncoding()` will change the behaviour of how the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are unofficially targeting US English, so s/behaviour/behavior.

comparison function will begin to measure the buffer's size in characters.

This is not a problem in common cases with `utf8` or `ascii`. But it is
advised to be mindful about this behaviour when working with long strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

The use of `readable.setEncoding()` will change the behaviour of how the
`highWaterMark` operates in non-object mode.

Atypically, the size of the current buffer is measured against the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'atypically' here should be 'typically'?

@mscdex
Copy link
Contributor

mscdex commented Jun 3, 2017

Also, the first line of the commit message is too long.

@jalafel jalafel force-pushed the doc-#6798 branch 3 times, most recently from 0231823 to 9205f3f Compare June 4, 2017 19:24
@tniessen
Copy link
Member

@jessicaquynh This needs to be rebased, sorry for the delay.
@mscdex @vsemozhetbyt Could you PTAL?

@BridgeAR
Copy link
Member

Ping @jessicaquynh this needs a rebase

@jalafel
Copy link
Contributor Author

jalafel commented Aug 28, 2017

@BridgeAR Sorry! Didn't realize that was on me. Thank you!

the internal buffer before ceasing to read from the underlying
resource. Defaults to `16384` (16kb), or `16` for `objectMode` streams
* `encoding` {string} If specified, then buffers will be decoded to
* `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the casing for the parameter type should stay the same.

* `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store
in the internal buffer before ceasing to read from the underlying resource.
Defaults to `16384` (16kb), or `16` for `objectMode` streams
* `encoding` {String} If specified, then buffers will be decoded to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


Typically, the size of the current buffer is measured against the
`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
comparison function will begin to measure the buffer's size in characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps italicize the word 'characters' here as well since it is done for the word 'bytes' in the previous sentence.

`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
comparison function will begin to measure the buffer's size in characters.

This is not a problem in common cases with `utf8` or `ascii`. But it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems a little confusing to me since we're comparing a multi-byte encoding and a single-byte encoding here. The way the sentence reads to me is that 'ascii' and 'utf8' are common cases and I do not need to worry about them, which is not true for multi-byte utf8 characters. If we're going to name some example encodings, you might replace 'utf8' with 'latin1' instead.


This is not a problem in common cases with `utf8` or `ascii`. But it is
advised to be mindful about this behavior when working with long strings
with special characters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to be more specific about what 'special' means, which is 'multi-byte'.

Also, the length of the string has no bearing on this particular problem. You could have a single character that spans 4 bytes for utf8 for example.

@jalafel jalafel force-pushed the doc-#6798 branch 2 times, most recently from 52246a0 to d0094d9 Compare August 29, 2017 15:10
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the comment addressed.


This is not a problem in common cases with `latin1` or `ascii`. But it is
advised to be mindful about this behavior when working strings that contain
multi-byte characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence does not seem to be complete. I guess there is a with missing after "working". I would also add a could after "that" as not every string would likely be an issue.

This commit documents and edge-case behavior in readable streams. It is
expected that non-object streams are measured in bytes against the
highWaterMark. However, it was discovered in issue nodejs#6798 that after calling
.setEncoding() on the stream, it will thereafter begin to measure the buffer's
length in characters.
@BridgeAR
Copy link
Member

BridgeAR commented Sep 8, 2017

Landed in 89f2074

@jessicaquynh thanks a lot for your contribution

@BridgeAR BridgeAR closed this Sep 8, 2017
BridgeAR pushed a commit that referenced this pull request Sep 8, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
MylesBorins pushed a commit that referenced this pull request Sep 10, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Sep 10, 2017
MylesBorins pushed a commit that referenced this pull request Sep 11, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
MylesBorins pushed a commit that referenced this pull request Sep 12, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
addaleax pushed a commit to addaleax/node that referenced this pull request Sep 13, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: nodejs#13442
Refs: nodejs#6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
MylesBorins pushed a commit that referenced this pull request Sep 20, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Sep 20, 2017
MylesBorins pushed a commit that referenced this pull request Sep 21, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
MylesBorins pushed a commit that referenced this pull request Sep 26, 2017
This commit documents and edge-case behavior in readable streams.
It is expected that non-object streams are measured in bytes
against the highWaterMark. However, it was discovered in issue
thereafter begin to measure the buffer's length in characters.

PR-URL: #13442
Refs: #6798
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Vse Mozhet Byt <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants