Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-single-logical-flow (multiple pulls) #30

Open
Fishrock123 opened this issue Jun 6, 2019 · 9 comments
Open

Non-single-logical-flow (multiple pulls) #30

Fishrock123 opened this issue Jun 6, 2019 · 9 comments

Comments

@Fishrock123
Copy link
Owner

Fishrock123 commented Jun 6, 2019

Moving out from #23 (comment)

It seems that newer network protocols like QUIC desire multiple chunks of data to be in-flight at once (besides consider re-sending).

This probably violates these two core design ideas:

  • One-to-one: The protocol assumes a one-to-one relationship between producer and consumer.
  • In-line errors and EOF: Errors, data, and EOF ("end") should flow through the same call path.

It may also unleash zalgo? lol.

Anyways, I think it is possible to still keep things simple and "pretend" that things are multiplexed, by doing slightly more waiting at the network sink end. I'm not really sure that perf would be considerably impacted in most cases?

Edit: See #30 (comment) for updated thoughts.

@jasnell
Copy link
Contributor

jasnell commented Jun 6, 2019

I don't actually think QUIC's design violates these ideas. The data flow is still one-to-one between producer and consumer, and in-line errors, data, and EOF are still in the same call path. The only difference that QUIC introduces is this idea that a chunk of data that I've already seen might need to be called again. At most, we may need to differentiate terminal states such that we separate There-is-no-more-data-to-give-you-go-away vs. There-is-no-additional-data-to-give-you-but-you-can-re-request-data-you-asked-for-before.

@Fishrock123
Copy link
Owner Author

Ok so I just had a wild thought... what if, instead of having multiple pulls, implementations requiring this just use multiple streams?

Maybe this isn't easy if there's no limit to the multiplexing but hear me out...

The fact that currently only one request/response for data can be in flight in the stream at any one time is a big parts of what reduces the need for almost any state, especially for error handling. This makes the whole thing a lot cheaper. So, if you can, say, share a file descriptor... you could open N number of streams to it, which would be able to do the work similarly cocurrently while being much more simple logically, and not really much extra overhead. From my past musings, I am quite certain that making split/join transforms would also be pretty easy to do logically correct, which would allow such a system to talk to a single stream endpoint if necessary. Additionally... maybe that kind of thing could be threaded easier in C++?

Idk, lmk what you think. I can prototype out a split/join.

@jasnell
Copy link
Contributor

jasnell commented Jun 7, 2019

Hmm that could work. Let me stew on it.

@Fishrock123
Copy link
Owner Author

Fishrock123 commented Jun 11, 2019

Should receiving additional pull() and/or next()s from components which an error has not yet bubbled to... matter?

i.e. component has errored, error passed along, but data is still flowing around at the same time. It seems to me that preventing post-mortem flow would require a decent amount of extra work.

How does QUIC handle this situation?

@Fishrock123
Copy link
Owner Author

Fishrock123 commented Jul 22, 2019

So I've had returning thoughts to this, mostly from two places:


  1. @mcollina's request for multi-buffer support

I can't really think of a pleasant way to fit MB support into the existing api - it never really makes sense to me, within the whole modal multiple buffers, I think, should be separate responses.

However, adding multi-pull support could alleviate this, possibly? If the sink can accept multiple chunks, it could pull multiple times and then get responses accordingly?

(One note on that, we may have to an additional status.ended to deal with pulls that are after end? Extra complexity.)


  1. My experience writing crc-transform

I was comparing against my built in crc32 cli command (which runs TCL and Perl in some combination talking to zlib's native C crc calculator) and wasn't happy with the numbers.

Getting it "close" to native numbers was much harder than expected and the best I could do still took 50% longer (and much more cpu). While thinking on potential optimizations I realized that it would be ideal to be making another async filesystem request at the same time as you are currently processing one, necessitating something like multiple pulls via setImmediate()s. It could be tricky to 'get right' but the payoff could be pretty big? Again, more complexity...

This was referenced Jul 23, 2019
@Raynos
Copy link
Collaborator

Raynos commented Aug 6, 2019

My understanding of the sink api is that is not very friendly for writing to, it needs a source and it's back pressure mechanism is to pull() when it wants more data.

For multiple buffer support you can implement nextv() api which just writes multiple buffers to the sink, it can then.

One piece of contention in the current design of the sink API is whom allocates the buffer. If I have data already in buffers that needs to be written to a socket or disk it doesn't make sense for the sink to allocate a write buffer and tell me to copy values into it.

@Raynos
Copy link
Collaborator

Raynos commented Aug 6, 2019

Getting it "close" to native numbers was much harder than expected and the best I could do still took 50% longer (and much more cpu). While thinking on potential optimizations I realized that it would be ideal to be making another async filesystem request at the same time as you are currently processing one, necessitating something like multiple pulls via setImmediate()s. It could be tricky to 'get right' but the payoff could be pretty big? Again, more complexity...

I think it's fine for the implementation of a sink to pre-emptively call pull() immediately once next() is called and then to start processing CPU bound stuff.

It will need a boolean field to gaurd against re-entry if pull() calls next() synchronously and it will need a queue of pending buffers to process etc.

Actually it needs to keep a counter of pending pull operations in case pull calls next synchronously which calls pull synchronously etc, causing the entire source to be pulled before any processing at all.

@Fishrock123 Fishrock123 changed the title Non-single-logical-flow (multiple pull requests) Non-single-logical-flow (multiple pulls) Sep 23, 2019
@dominictarr
Copy link

I tried allowing multiple calls in pull-stream, but decided it added too much complexity. Is this something that needs to be supported along the entire pipeline or something that can just be an internal detail of the QUIC implementation? On the write side it's easy - just accept multiple writes at a time. on the read side, since BOB has the reader pass in the buffer, it's not gonna work. (my gut feeling is that that's too complicated anyway, an object pool for buffers would have the same advantages but would decouple it from streams, allowing object streams, which make streams more useful)

@jasnell
Copy link
Contributor

jasnell commented Dec 12, 2019

The more I think about it the more I think we won't need the multiple reads for quic. So I think we can completely avoid the complexity in that case. The basic protocol just works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants