stream: mutable highwatermark #33346

0807Jpatel · 2020-05-10T23:01:19Z

added methods to update highWaterMark and objectMode values of Readable and Writable streams.
Refs: #32731
Can also be used for following: #30107

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

mscdex · 2020-05-10T23:18:39Z

As far as the original issue goes, perhaps it would be enough to instead pass in highWaterMark values in an options object passed to http.createServer()?

0807Jpatel · 2020-05-10T23:23:06Z

yep, I was intending to do that. But just seemed like a hack than a solution. And it was suggested that I just make them mutable.

mscdex · 2020-05-11T00:52:32Z

I see it as the opposite. Being able to dynamically change the watermark after stream creation doesn't make much sense and I've never seen a real use case for it. Specifying the highWaterMark up front like we can with http/https/tls's request() makes more sense and would be more consistent if we added them to the server counterparts.

0807Jpatel · 2020-05-11T02:25:38Z

I am not sure why the checks failed. I don't see any error message.

mscdex

making it official for the reasons previously stated. additionally because i think we should strive to not "pollute" streams objects as much as possible, even if the chance of a property clashing is unlikely.

addaleax

Soo … I agree that it seems odd to be able to change the objectMode flag on a stream after creation – conceptually, whether a stream is in object mode or not is part of its type and should always be known by the time its constructor is called.

However, for the highWaterMark, I see no reason not to make it adjustable, and just because @mscdex has “never seen a real use case for it” doesn’t mean that it isn’t a sensible thing to add. The HWM is mostly relevant for the producer/consumer of a stream, which is often different from the creator of the stream and thus not necessarily in control over constructor arguments.

I don’t think property clashing is a realistic argument here.

@nodejs/streams @nodejs/tsc

doc/api/stream.md

mcollina

I'm -1 in changing objectMode. I'm ok in adding a way to change the highWaterMark.

However, instead of adding a new method, we should just add a set accessor to

node/lib/_stream_readable.js

Lines 1087 to 1092 in 8a6fab0

    
           readableHighWaterMark: { 
        
             enumerable: false, 
        
             get: function() { 
        
               return this._readableState.highWaterMark; 
        
             } 
        
           },

.

lib/_stream_readable.js

0807Jpatel · 2020-05-11T10:44:47Z

@mcollina why are you against objectMode? because I can see use case of wanting to use objectMode over the network.

mcollina · 2020-05-11T10:48:28Z

Can you please articulate why you'd want to change objectMode? Changing it means that downstream or upstream should be aware of the change. Morever, it means that it will possible to change this for streams that are binary-only by default.

It also changes how encoding works etc.

0807Jpatel · 2020-05-11T10:59:08Z

wouldn't the client receive chunks of object when requesting (http.request) if objectMode is not set to true?
This could happen when Socket decides to write only half of the "object" in a transfer and other half in next transfer.

0807Jpatel · 2020-05-11T12:21:34Z

var http = require('http');
const { Readable, Writable } = require('stream')

let i = 0;

const myReadable = new Readable({
  objectMode: true,
  read(size) {
  if(i < 3){
    this.push(Buffer.alloc(65536));
  } else {
    this.push(null);
  }
  i++;
  }
});

const myWritable = new Writable({
  objectMode: true,
  write(chunk, encoding, callback) {
    console.log("write length: ", chunk.length);
    callback();
  }
});


var server = http.createServer(function (req, res) {
  myReadable.pipe(res);
});

server.listen(8000, '127.0.0.1');

http.get('http://localhost:8000',  res => {
  res.pipe(myWritable);
});

Output:

write length:  65426
write length:  110
write length:  32761
write length:  32775
write length:  32752
write length:  32784

The idea behind making objectMode mutable was to "fix" this issue. Where each objects are broken down into chunks even tho both myReadable and myWritable set objectMode to true.
@ronag @mcollina

ronag · 2020-05-11T12:29:03Z

@0807Jpatel Sorry, I'm confused here. What is it we're trying to solve? The changing of objectMode doesn't really make sense to me.

jasnell · 2020-05-11T12:34:53Z

In your example @0807Jpatel, the code is not really broken. In objectMode, the Writable will accept any JavaScript value and it's up to the implementation to decide if it's the right kind of value it expects. In this case, you're passing in Buffer instances that are objects, so the downstream Writable accepts them with no problem. These aren't partial writes or broken down into anything, it's dealing with the Buffer objects given to it on each write.

I agree with the others, making objectMode mutable does not make sense. Making highWaterMark mutable is fine.

0807Jpatel · 2020-05-11T13:09:06Z

@jasnell @ronag, Sorry. My fault it seems like I understood the objectMode wrong.
Do we not support objectMode for outgoing messages? When I try to push anything other than Buffer or String, from myReadable it throws error The "chunk" argument must be of type string or an instance of Buffer or Uint8Array Its coming from _http_outgoing.js

jasnell · 2020-05-11T13:12:49Z

Correct, most streams operate in byte-mode (objectMode = false) which means they treat strings and Buffer objects differently (as byte streams rather than objects). This is the default mode for all Writable and Readable objects. It is not possible to pipe from a Readable with objectMode = true to a Writable with objectMode = false.

jasnell · 2020-05-11T13:14:05Z

@mcollina and @ronag : It might actually make sense for both pipe and pipeline to actually verify that a Readable that is being piped to a Writable are actually operating in the same mode rather than waiting for the write attempt.

0807Jpatel · 2020-05-11T13:31:32Z

@jasnell could you please explain why we don't support objectMode for outgoing message, because to me it makes sense that user would want to send objects through SSE.

mcollina · 2020-05-11T13:43:14Z

It might actually make sense for both pipe and pipeline to actually verify that a Readable that is being piped to a Writable are actually operating in the same mode rather than waiting for the write attempt.

Not true. A trick I use all the time it to emit strings or buffer in object mode, and then pipe that into a Writable in binary.

addaleax · 2020-05-11T17:54:44Z

Yeah, we do have tests that check that piping between object-mode/non-object-mode streams works so long as there’s only Buffers/strings involved :)

benjamingr · 2020-05-11T17:56:35Z

@0807Jpatel I think you might be mixing up a few things here:

could you please explain why we don't support objectMode for outgoing message, because to me it makes sense that user would want to send objects through SSE.

You have an outgoing HTTP message.
At the end of the day bytes get written on a socket via a system call.
Streams let you decide how often to write bytes to the buffer and watermarks let you avoid frequent pulls and avoid unneeded I/O. (So you buffer until you reach the high watermark and stop).
Streams are by default sequences of bytes. That's what platforms typically understand.

In order to send an object over HTTP - you must first specify how Node.js should translate it into bytes. This is done by serializing it into a data exchange format like protocol buffers or JSON.

Now your use case isn't unreasonable. Some frameworks (asp.net WebAPI comes to mind) perform "content negotiation" - based on the incoming message and serialize the stream in an output format they think the client understands (still in bytes).

Streams solve a very real case for Node and most Node programs build on them (file streams, network streams etc). They serve a fundamental and crucial role in the platform.

Now - what about object mode? If I understand correctly it was a happy accident (@Raynos asked for them?). Object mode is certainly useful but it doesn't serve the same "building block" IMO and doesn't belong in the same layer.

An object mode "stream" for HTTP would need to perform said "content negotiation" with the client and switching in the middle of a request would be confusing. Mostly since the stream would need to know how to serialize its entities in order to do so correctly (and not all objects are serializable and certainly not in every representation).

As for this PR, I agree with @addaleax's comment #33346 (review) and I echo her use case of streams whose creation users don't control. There are plenty of examples, here is one related to a use case I was discussing with Matteo:

I have a child process I created with fork that takes an array of stdio in order to offload work to another process (in the real case logging).
I pass 'pipe' to stdio to pipe the data from the parent to the child.
I get a stream back on subprocess.stdin whose watermarks I can't control.

I could pass a stream in instead of a string (that's how I worked around it), but I can definitely see the use case here and the solution I ended up owith was very unergonomic.

benjamingr · 2020-05-11T17:59:48Z

doc/api/stream.md

@@ -568,6 +568,24 @@ added: v12.3.0

 Getter for the property `objectMode` of a given `Writable` stream.

+##### `writable.updateWritableHighwaterMark(highwaterMark)`


I can actually see a case for this part and updating the watermarks. I even needed to use the lowWatermark at some point for a specific use case at my old job and was sad to find out we don't have one anymore :]

I think this sort of change would need to come with a concrete use case of why I'd update the watermarks.

doc/api/stream.md

Raynos · 2020-05-12T09:21:45Z

Now - what about object mode? If I understand correctly it was a happy accident (@Raynos asked for them?). Object mode is certainly useful but it doesn't serve the same "building block" IMO and doesn't belong in the same layer.

The original use case for objectMode is a userland package like https://www.npmjs.com/package/JSONStream

This is a Writable stream of bytes ( or utf8 strings ) and a Readable stream of javascript objects

I don't believe node core has a use case for objectMode: true anywhere but there are many packages in npm that use require('stream').Readable and require('stream').Writable interfaces without writing buffers or strings or anything else that can meaningfully be "buffered" or "concatenated"

I believe objectMode: true is a hack and a bad idea in hindsight but we must carry it forward for backwards compatibility.

I can't imagine any use case for changing it after the creation of a stream.

If you want to know another fun story about why objectMode exists look at the sin that is https://github.com/Raynos/events-stream ; At some point in the past I thought that stream.Readable was a great abstraction for piping DOM events around in pure browser js code.

Raynos · 2020-05-12T09:24:18Z

For changing highWaterMark after creation I am 👍 if it's not too challenging.

benjamingr · 2020-05-15T14:34:53Z

@ronag I think the major issue is that none of are are completely sure about this :D

I have an example use case here at the bottom #33346 (comment)

To be fair at my last job (building stuff for p2p CDNs) this sort of "games" with watermarks were extremely common and I ran into a lot of issues with watermarks I worked around - so my point of view is probably very biased towards one use case where really accurate control of the amount of buffering is important. For example - I ran into the lack of low watermarks since Node.js 0.9 and maintained only to be the second person to ask for it if someone ever did - but no one did while I was working on the p2p stuff so I never asked for it :]

ronag · 2020-05-15T14:57:17Z

I have an example use case here at the bottom #33346 (comment)

You can already raise the Readable watermark by doing a manual read(newHigherWatermark).

Which makes me wonder why we need https://github.com/nodejs/node/pull/33346/files#diff-ba6a0df0f5212f5cba5ca5179e209a17R1096 (which is my primary concern).

Likewise, I'm uncertain why exactly we need the corresponding condition in Writable.

this sort of "games" with watermarks were extremely common and I ran into a lot of issues with watermarks I worked around

Note that sometimes these kind of things are a chain of streams and you are just receiving the head ot the chain. Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

e.g.

function myApi(...) {
   return pipeline(readFromFile(...), someTransform())
}

const src = myApi()
src.highWaterMark = 1e9 // this will only change the last in the pipeline chain... if it doesn't kill it with the current PR :/

benjamingr · 2020-05-15T15:12:34Z

You can already raise the Readable watermark by doing a manual read(newHigherWatermark).

Technically that would work but that's not the same thing as just setting the high watermark. In my use case I don't actually want to read anything yet and I don't want to pull the data - I just want to set up the amount of buffering.

Asking users to .read(newHwm) data just to set the highWatermark even though in practice you might not even read that much data (from a consumer PoV) didn't really address my use case (neither in the example above nor in the "real" cdn streaming use case),

One can argue these use cases are so unique they don't justify complicating our code or API - I will concede to that if you feel strongly it doesn't justify the added complexity.

Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

Yes, that's actually the point David raised (of double buffering and watermarks) in the deno streams discussion I linked to above.

benjamingr

Dismissing my review in the meantime so this doesn't land by accident until I am convinced again this is a good idea and discussions have exhausted.

To be clear - the actual code and changes LGTM and I am still +1 on this changes, but there are ongoing discussions and I want to make sure this doesn't get merged (with the 2 LGTMs) while they are happening.

ronag · 2020-05-15T15:18:35Z

I think configuration of watermarks should be done during creation to make it consistent and work without edge cases. So with @benjamingr fork example #33346 (review). I think the watermark is an option you pass to fork (does not exists at the moment) which would ensure that the users expectation is fulfilled regardless of double buffering etc...

benjamingr · 2020-05-15T15:22:17Z

I think configuration of watermarks should be done during creation to make it consistent and work without edge cases

The problem I see is even if configuration of watermarks happens during creation that doesn't ensure no double buffering occurs or make any guarantees mutable watermarks don't have.

Maybe @0807Jpatel can share what use case they had that these changes address? I want more voices and use cases to be heard other than mine here :]

ronag · 2020-05-15T15:37:56Z

the actual code and changes LGTM and I am still +1 on this changes

I'm -0 with the errorOrDestroy conditions. Otherwise I'm unconvinced but neutral.

0807Jpatel · 2020-05-15T15:48:49Z

I agree on @ronag that previous solution made more sense, but it would definitely lead to some bad practice. AFAICT there should not be any reason to change highWaterMark while reading/writing.

Our use case is actually very simple. we need to reduce the highwatermark value of response.

var server = http.createServer(function (request, response) {
  response.writableHighWaterMark = 5 * 1024;
  myReadable.pipe(response);
});

ronag · 2020-05-15T15:53:38Z

AFAICT there should not be any reason to change highWaterMark while reading/writing.

Neither do I. But write can call things indirectly which can cause the user to unexpectedly be in the write scope. It's kind of an unlikely edge case... but it's another rule the user has to keep track of and I'd prefer fewer of those.

0807Jpatel · 2020-05-15T16:09:40Z

The whole idea behind this is to change highWaterMark before you start doing read/write that also includes pipe.
https://github.com/nodejs/node/pull/33346/files#diff-b88f51f89100f68ef3f96cfa3a07929eR22
This would be the ideal use case, but I def understand what you are saying.
Do you know any arguments against previous implementation? (where we defer update till next read/write) besides the complexity.

ronag · 2020-05-15T16:11:24Z

Do you know any arguments against previous implementation? (where we defer update till next read/write) besides the complexity.

#33346 (comment)

Thinking that changing the watermark will do what you expected is going to be unfortunate and confusing.

ronag · 2020-05-15T16:12:00Z

Our use case is actually very simple. we need to reduce the highwatermark value of response.

Why do you want to reduce it? Reduce memory usage?

Can we think of some other way to solve this? e.g. a watermark option to http.createServer?

Another consideration would be to add it as an option to pipe? Would reduce API surface and potential misuse.

0807Jpatel · 2020-05-15T16:16:57Z

@ronag we have gone full circle here. This thread started with that exact suggestion. When I asked for the feature request, it was to add highwatermark to http.createServer. But we have seen enough use case to make highwatermark mutable. #33346 (review)

IMO we should make highwatermark mutable, but lets see what others have to say about this.

benjamingr · 2020-05-16T12:41:42Z

@0807Jpatel hey, just wanted to apologize for the (probably) frustrating process of being asked to do multiple contradicting things :]

I hope it's clear that this is genuinely because we are honestly not sure what the right thing here (as a project) is. I think there was a misunderstanding (in our user base and in core) regarding what highWaterMark does. I also want to mention that in terms of technical levels of the PRs, willingness to collaborate and actual code you are doing well.

Here are some things we may want to do to move this forward:

Have a "watermarks in streams" session in the summit next month where we get everyone interested in one room (you invited obviously), show the use cases and decide what solution makes the most sense (mutable highWatermark) vs. setting it in places like http.createServer.
Have the same session but not as a part of the OpenJS summit (just a bunch of people).
Escalate the issue to the technical steering committee and ask for guidance about the process.
Get buy in from all collaborators who have made interesting contributions to the discussion here (Brian, Anna, Matteo, Robert etc) in the issue to move forward - which looks kind of hard at this point.

I'll happily help facilitate either of these 4 options (or an alternative I haven't considered yet). You are also welcome to send me an email (all our emails are in the README). Intuitively I recommend the first one (so more people have a chance to attend).

benjamingr · 2020-05-19T17:27:53Z

@0807Jpatel any particular preference to either of the options?

0807Jpatel · 2020-05-21T01:35:07Z

@benjamingr I think making highwaterMark is what I would prefer. Someone has done a patch work for http.request to take in highwatermark as options. The issue with that route is we will eventually see a use case that can only be solve by making highwater mark mutable, and all these patch works will still need to be part of the code.
I agree with you guys that more opinions are def needed.

benjamingr · 2020-05-23T17:46:43Z

@nodejs/tsc can you please provide guidance about how to proceed here?

This isn't "stalled" and there are no fundamental disagreements, we need some way to:

Get interested parties to discuss setting the watermark (ronag, jpatel, mcollina, mscdex, myself, etc).
Figure out where's the right place to add this capability if we want to support it (to https, to the stream itself or elsewhere).
Follow up on that.

Right now we have a first-time-contributor who has opened several PRs attempting to resolve this and we have given them confusing/conflicting comments in several places. I'm a bit frustrated that this has been their experience here. (It's a tricky change that's fair)

ronag · 2020-05-23T18:11:23Z

I'm more than happy to further discuss this. Though I think what OP is actually trying to solve here has gotten a bit lost (see refs).

I believe there are a few different things here:

More backpressure on server incoming message (#30107).

Which lead to (#30107):

Setting the highwatermark on a http server response (i.e. IncomingMessage)
- Is it per instance or globally?

Which lead to this PR:

Changing the highWatermark on an already constructed stream
- Unclear if it's a good idea. Are we actually just trying to solve the above?

@benjamingr FYI the OP's referenced original issue #32731 has been resolved/closed #32731 (comment)

0807Jpatel · 2020-05-23T18:20:37Z

@ronag sorry that close was premature, transform did not solve the issue we are facing. We are just trying to reduce highWaterMark of response in createServer. There are two ways of solving this

add ability to change highWaterMark.
update createServer to take in highWaterMark as part of option. We see something similar to this in Feature Request: Option to configure highWaterMark of HTTP response #30107. The reason I brought this up was to show we can do something similar to this. But if in future there is a use case to make highwaterMark mutable, all the "patch" work we do here will still need to be supported.

So I think making highwaterMark mutable is better, as we have seen couple of use cases already and no significant draw back. I think most of us in this thread agree that it is okay to add this, we are just trying to figure out whats the best way to implement this.

ronag · 2020-05-23T18:28:37Z

transform did not solve the issue we are facing

What is the issue you are facing and why did transform not resolve it? From what I understood you just wanted to create more backpressure?

add ability to change highWaterMark.

I'm not blocking this but I'm skeptical of its usefulness and worried about complexity and user confusion.

Since the highwatermark is kind of a hint... writing code that actually depends on the observable behavior of the highwatermark is something I would discourage (unless in objectMode).

update createServer to take in highWaterMark as part of option

I'm not sure I followed the reason for not going down this track?

changed my mind.

mcollina · 2020-05-24T16:09:34Z

update createServer to take in highWaterMark as part of option.

I would actually recommend this (independently of this patch).

Thinking about this a bit more, It would be really hard to support this for http.OutgoingMessage, if that is the end goal.

not up to sync on this and don't want to block by accident

bl-ue · 2021-06-05T10:26:48Z

I'm guessing that this PR, being quite old, should be closed.

feat(stream): added mutable highwatermark and object mode

cf72487

nodejs-github-bot added the stream Issues and PRs related to the stream subsystem. label May 10, 2020

mscdex suggested changes May 11, 2020

View reviewed changes

addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label May 11, 2020

addaleax reviewed May 11, 2020

View reviewed changes

doc/api/stream.md Outdated Show resolved Hide resolved

mcollina requested changes May 11, 2020

View reviewed changes

lib/_stream_readable.js Outdated Show resolved Hide resolved

fix(readme): added REPLACEME in place of version

134c204

fix: remove debug statements from writable stream

8ab362b

benjamingr reviewed May 11, 2020

View reviewed changes

doc/api/stream.md Outdated Show resolved Hide resolved

chore: removed mutable objectMode and highWaterMark uses setter

5ca4f8c

benjamingr previously requested changes May 15, 2020

View reviewed changes

benjamingr mentioned this pull request May 23, 2020

HTTPS highwatermark not working #33262

Closed

This comment has been minimized.

Sign in to view

BridgeAR force-pushed the master branch 2 times, most recently from 8ae28ff to 2935f72 Compare May 31, 2020 12:18

guybedford force-pushed the master branch from dc5a5da to 8e46568 Compare March 29, 2021 21:33

mcollina closed this Jun 5, 2021

	readableHighWaterMark: {
	enumerable: false,
	get: function() {
	return this._readableState.highWaterMark;
	}
	},

		@@ -568,6 +568,24 @@ added: v12.3.0

		Getter for the property `objectMode` of a given `Writable` stream.

		##### `writable.updateWritableHighwaterMark(highwaterMark)`

stream: mutable highwatermark #33346

stream: mutable highwatermark #33346

Conversation

0807Jpatel commented May 10, 2020

Checklist

mscdex commented May 10, 2020

0807Jpatel commented May 10, 2020

mscdex commented May 11, 2020

0807Jpatel commented May 11, 2020

mscdex left a comment

Choose a reason for hiding this comment

addaleax left a comment

Choose a reason for hiding this comment

mcollina left a comment

Choose a reason for hiding this comment

0807Jpatel commented May 11, 2020

mcollina commented May 11, 2020

0807Jpatel commented May 11, 2020 • edited Loading

0807Jpatel commented May 11, 2020 • edited Loading

ronag commented May 11, 2020 • edited Loading

jasnell commented May 11, 2020

0807Jpatel commented May 11, 2020

jasnell commented May 11, 2020

jasnell commented May 11, 2020

0807Jpatel commented May 11, 2020

mcollina commented May 11, 2020

addaleax commented May 11, 2020

benjamingr commented May 11, 2020 • edited Loading

benjamingr May 11, 2020

Choose a reason for hiding this comment

Raynos commented May 12, 2020

Raynos commented May 12, 2020

benjamingr commented May 15, 2020

ronag commented May 15, 2020 • edited Loading

benjamingr commented May 15, 2020

benjamingr left a comment

Choose a reason for hiding this comment

ronag commented May 15, 2020 • edited Loading

benjamingr commented May 15, 2020

ronag commented May 15, 2020 • edited Loading

0807Jpatel commented May 15, 2020

ronag commented May 15, 2020 • edited Loading

0807Jpatel commented May 15, 2020

ronag commented May 15, 2020 • edited Loading

ronag commented May 15, 2020 • edited Loading

0807Jpatel commented May 15, 2020

benjamingr commented May 16, 2020

benjamingr commented May 19, 2020

0807Jpatel commented May 21, 2020

benjamingr commented May 23, 2020 • edited Loading

ronag commented May 23, 2020 • edited Loading

0807Jpatel commented May 23, 2020

This comment has been minimized.

ronag commented May 23, 2020 • edited Loading

mcollina commented May 24, 2020

bl-ue commented Jun 5, 2021 • edited Loading

0807Jpatel commented May 11, 2020 •

edited

Loading

0807Jpatel commented May 11, 2020 •

edited

Loading

ronag commented May 11, 2020 •

edited

Loading

benjamingr commented May 11, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

ronag commented May 15, 2020 •

edited

Loading

benjamingr commented May 23, 2020 •

edited

Loading

ronag commented May 23, 2020 •

edited

Loading

ronag commented May 23, 2020 •

edited

Loading

bl-ue commented Jun 5, 2021 •

edited

Loading