Following an usual non-blocking pattern implementation adds extra time. #195

lemunozm · 2021-04-13T07:14:52Z

Hi, first of all, thank you for your amazing work in tungstenite-rs!

I am using your library with non-blocking sockets, and all is working fine. Nevertheless, when I compare times among reading 1-byte in blocking and non-blocking ways I noticed that the non-blocking was around twice as slow: 8us for reading 1-byte in a blocking schema and around 15us reading in a non-blocking schema.

Investigating about it, this double increment comes from in non-blocking schema I'm "force" to call WebSocket::read_message() twice. Once to receive the byte I sent, and other call to receive the WouldBlock that notify me that there are no more messages:

loop {
    match websocket.read_message() {
         Ok(message) => (), // 1-byte received.
         Err(Error::Io(ref err)) if err.kind() == ErrorKind::WouldBlock => break,
         //...
    }
}

Currently, I fixed this to avoid the second call to read_message() by checking by my self if there is data in the stream or the socket would block:

loop {
    match websocket.read_message() {
        Ok(message) => {
            // 1-byte received
            if let Err(err) = web_socket.get_ref().peek(&mut [0; 0]) {
                if err.kind() == ErrorKind::WouldBlock {
                    break;
                }
            }
        }
        Err(Error::Io(ref err)) if err.kind() == ErrorKind::WouldBlock => break,
        //...
    }
}

With the above code, I correctly read a 1-byte message in 8us instead of 15us, but is far to be obvious for a non-blocking user, that is used to perform this kind of reading pattern until getting WouldBlock.

Taking a look inside read_message() I saw that there is a lot of things done inside. Maybe these things should be done only in the case read_message() has data to read and if not, perform an early exit with WouldBlock.

The text was updated successfully, but these errors were encountered:

daniel-abramov · 2021-04-13T20:05:02Z

That's roughly what we do. Though we can't remove WouldBlock unless we got it from the underlying reader. But if you have such extreme optimisation needs (microseconds), I would suggest to look into the tokio-tungstenite. Tokio has built the right and optimal way to work with non-blocking I/O for you, so that the read_message() and co are not called when they are not ready.

lemunozm · 2021-04-14T13:10:10Z

Thanks for the answer! I've taken a look, and until I could see, this "last" call to the underlying read_message() to retrieve the WouldBlock still happens.

I know that I'm moving at the microseconds level. For a big message, it has no impact. But if the message is small, the overhead the library produces in reading it is almost double. In a use case where a client sends a lot of several small messages, the receiver of those messages will be considerably impacted.

Maybe an easy solution for this could consist of starting reading from the network inside read_message() instead of stating writing pending messages. If there is some data there, the function performs its pipeline as usual, if that reading had an error (WouldBlock), the sending part of the read_message is not performed.

Is there any reason to do the writing part before the reading one inside the read_message()?

daniel-abramov · 2021-04-16T17:02:10Z

I know that I'm moving at the microseconds level. For a big message, it has no impact. But if the message is small, the overhead the library produces in reading it is almost double. In a use case where a client sends a lot of several small messages, the receiver of those messages will be considerably impacted.

Your considerations make sense, but if you're using the sockets in a non-blocking mode the proper approach would be to not call read_message() until you're sure that the socket is ready to read/write and that's exactly where Tokio would help you. I would argue that inefficiency from trying to read from/write to the socket unconditionally is the one that causes the non-blocking version be less efficient.

Maybe an easy solution for this could consist of starting reading from the network inside read_message() instead of stating writing pending messages. If there is some data there, the function performs its pipeline as usual, if that reading had an error (WouldBlock), the sending part of the read_message is not performed.

You mean trying to read from the socket and in case it returns WouldBlock return immediately? I'm afraid that would break the invariants that the library tries to provide which currently ensure that we e.g. reply to pings with pongs (according to the RFC). I.e. we must make sure that even if the one calls read_message() only (without explicitly sending anything), we adhere to the RFC.

lemunozm · 2021-04-16T20:17:41Z

but if you're using the sockets in a non-blocking mode the proper approach would be to not call read_message() until you're sure that the socket is ready to read/write

Of course, the problem is not knowing when to call read_message() (just when the OS poll wakes your socket) but know when to not call read_message(). To illustrate:

The Poll wakes your socket for reading data.
You call read_mesage() and receive a message.
You do not know if you had only one message or the socket contains more data that must be read, belonging to another message. As you don't know, you need to call again to read_message(). This process is repeated until read_message() gives the WouldBlock.
EDIT: I want to highlight that when the OS poll wakes a socket for reading, it must read all its content. You can not read a message and wait for another poll event without being sure that all data has been read, because the remaining data can be lost depending on the OS (for example, Windows do not notify your socket again if you left pending data without reading). Sadly, you are forced to read until WouldBlock is returned.

So minimum, in order to read only one message, you need two calls to read_message().

You mean trying to read from the socket and in case it returns WouldBlock return immediately?

Yes. When you receive a WouldBlock when reading, means that the socket was previously waked from the Poll because it had data to read. So, it means that just before this WouldBlock you already perform a call to read_message(), and there you did all the internal stuff (as ping-pong messages). So, probably you continue working according to the RFC. From my point of view, calling read_message() that returns a WouldBlock means that the action was not performed, so technically failed reading a message and should not adhere to the RFC as "reading a message".

Of course, this change wouldn't apply to blocking sockets since a read call never returns a WouldBlock in that mode and you still working according to the RFC without any side effect.

daniel-abramov · 2021-04-16T20:54:40Z

Of course, the problem is not knowing when to call read_message() (just when the OS poll wakes your socket) but know when to not call read_message(). To illustrate:
The Poll wakes your socket for reading data.
You call read_mesage() and receive a message.
You do not know if you had only one message or the socket contains more data that must be read, belonging to another message. As you don't know, you need to call again to read_message(). This process is repeated until read_message() gives the WouldBlock.
EDIT: I want to highlight that when the OS poll wakes a socket for reading, it must read all its content. You can not read a message and wait for another poll event without being sure that all data has been read, because the remaining data can be lost depending on the OS (for example, Windows do not notify your socket again if you left pending data without reading). Sadly, you are forced to read until WouldBlock is returned.

Your reasoning is correct, I have nothing to add to it. In fact we, at Snapview, initially designed our own scheduler based on mio and did everything manually to ensure that we read/write data to the socket when expected (based on epoll() etc). But Tokio does this heavy-lifting for the user.

Yes. When you receive a WouldBlock when reading, means that the socket was previously waked from the Poll because it had data to read.

We do return WouldBlock when we receive the WouldBlock while reading, but we first try to write the pending data before trying reading the data. If we call write after trying to read, we would return WouldBlock without trying to write the data that we might have in the buffer.

If I got your point right, you basically would like to have the possibility to call read and write separately, so that read only reads and write only writes, i.e. going one level lower (in this case it would be up to the user to ensure adhering to the RFC).

Currently the pseudocode is:

loop {
    write_pending();
    if let Some(m) = read()? {
        return Ok(m)
    }
}

Was your suggestion about having read() only? Or about moving write_pending() to a different place? (to which one?)

lemunozm · 2021-04-17T15:15:04Z

My suggestion is moving write_pending() to a different place inside of read_message().

It's true what you say about reading before writing. As a first view, if you read before write the pending data and receive WouldBlock, you could exit from the read_message() without sending that data. But what I want to expose is that conceptually (because of how non-blocking works) you will never have pending data to write if you have received a WouldBlock from the read. Why? Because if you obtain a WouldBlock means that you have previously called to read_message() where the pending data was sent. To summarize the steps:

Socket waked for reading from the poll (it means that there is some data to read).
read_message() the message was read and pending data was sent (no would block is returned here).
another call to read_message() immediately after to know if there is more data.
- If there is more data, write pending data and return the message (if there is enough data).
- If there is no data (WouldBlock) received while reading. The write pending data phase is not performed.

In pseudocode, the read_message() could be:

loop {
    let maybe_message = read()?;
    write_pending();
    if let Some(message) = maybe_message {
        return Ok(m);
    }
}

In this way, you save from call write_pending() in the case that you call read_message() and the socket has no data. If the poll wakes your socket for reading, this situation is not possible, since it will contain some data to be read, and then the write_pending() phase will be done.

daniel-abramov · 2021-04-20T17:19:18Z

But what I want to expose is that conceptually (because of how non-blocking works) you will never have pending data to write if you have received a WouldBlock from the read. Why? Because if you obtain a WouldBlock means that you have previously called to read_message() where the pending data was sent.

I think I did not quite get the point. Being ready to read and to write are 2 types of readiness. But maybe it's all about the steps that you summarized, see below.

another call to read_message() immediately after to know if there is more data.
If there is more data, write pending data and return the message (if there is enough data).
If there is no data (WouldBlock) received while reading. The write pending data phase is not performed.

Ok, I think I got what you mean. The thing is that if the socket is woken up and we know that we can read does not imply that read_message() will return a message. The fact that we can read from the socket does not mean that we are going to be able to read enough to "assemble" a fragment(s) that the message consists of.

In this way, you save from call write_pending() in the case that you call read_message() and the socket has no data. If the poll wakes your socket for reading, this situation is not possible, ...

It is possible, see above.

lemunozm · 2021-04-21T07:26:34Z

Being ready to read and to write are 2 types of readiness

Of course, but you only call to read_message() when you wake up for reading. So for this problem, we only handle read type of readiness. I understand that if you wake up for write you call write_message().

The fact that we can read from the socket does not mean that we are going to be able to read enough to "assemble" a fragment(s) that the message consists of.

I agree with it, and I support the idea of calling write_pending() when you read for data and it's not enough to assemble a message. What I'm saying is that I think the write_pending() can be avoided when your first call to stream.read returns WouldBlock. When calling read_message() different things could happen:

There is data, and there is enough data to know there is a message. Internally the method call write_pending() (and I agree). Also, you don't know if there is more data/messages to read, so you must call again to read_message() (see 3.).
There is data but is not enough to know if there is a message. Internally the method call write_pending() (and I agree). The method returns a WouldBlock to the user, so they do not call again to read_message(), the socket can go to sleep.
There is no data in the first call to stream.read. Internally the method call write_pending() (this is where I do not agree). The method returns a WouldBlock.

Why this distinction among 2. and 3.? because when you successfully read a message (from 1.) you are forced to call again read_message() to know if there is more data or not. This second call takes time doing things like the write_pending() when you in fact do not need it since you sent the pending message just before, in the first call to read_message(). You only need to know if the socket should go to sleep. Statistically, since the protocol is packet-oriented. Any successful call to read_message() that returns a message will need another call to read_message() to know what to do next. To center on the problem, this second call that reads nothing takes almost the same time that a successful call to read_message() with a small message, so reading a small message consumes almost twice the time.

Also, note that the socket never wakes up for read if in the first call to stream.read it returns a WouldBlock. So, the 3. situation only happens when you previously call 1 (and the pending data was already sent).

Sorry if my English is not the best and there is some lost information on it 😝

daniel-abramov · 2021-04-21T17:25:43Z

There is no data in the first call to stream.read. Internally the method call write_pending() (this is where I do not agree). The method returns a WouldBlock.

What do you mean by "there is no data in the first call to stream.read"? That we read 0 bytes (connection closed) or that we read not enough bytes to compose the message? The thing is that the (3) can only happen if you call read_message() unconditionally. If the socket is ready for reading and you call read_message(), then you will at least read something useful. If you called the read_message() without being informed about readiness of the socket, then it means that you either called it because you're ready to write (and in this case ignoring write_pending() would not be correct) or that you're calling the function without any conditions in a loop.

If we remove the write_pending() before read_message(), then we'll never call write_pending() unless we were able to compose a complete message within the read attempt. That's why we can't simply do read()?; write_pending(), but do it the other way around.

lemunozm · 2021-04-21T18:04:28Z

What do you mean by "there is no data in the first call to stream.read"? That we read 0 bytes (connection closed) or that we read not enough bytes to compose the message?

Neither, I mean that we get a WouldBlock without reading any bytes. I think that the confusion could come because there are two ways of reading here. One, the read coming from the stream, and the other the read_message_frame. In (3) I am referring to the stream read.

The thing is that the (3) can only happen if you call read_message() unconditionally.

Suppose the following:

Socker ready for reading.
Read a message calling read_message(). After that, the socket has no more data. But I do not know. The only way I have to know about it is by calling read_message() again.
Call read_message(). no bytes has been read, WouldBlock returned.

What I mean is that a socket ready for reading implies 0 or more calls to read_message(). I'm forced to read from it when I wake up until all data is read. This implies 0 or more messages, the entire data it contains.

If the socket is ready for reading and you call read_message(), then you will at least read something useful.

Yes, the first time I call it. There is a potential last call that will not have any data to read and a WouldBlock is returned. EDIT: This last call is performed almost always you call read_message() and you receive a message successfully. In this case, you compute the entire read_message() pipeline (write_pending, prepare input buffer,...) to only know that the socket needs to sleep again.

If we remove the write_pending() before read_message(), then we'll never call write_pending() unless we were able to compose a complete message within the read attempt.

I think that there is a third option. Only when no bytes had been read and WouldBlock is returned. Only in this case, you could avoid write_pending().

lemunozm · 2021-04-21T18:11:03Z

I think that code change would more visual, I will try to write a draft and share it with you.

lemunozm · 2021-04-22T07:05:41Z

As a draft:

pub fn read_message<Stream>(&mut self, stream: &mut Stream) -> Result<Message>
where
    Stream: Read + Write,
{
    // Do not read from already closed connections.
    self.state.check_active()?;

    loop {
        match self.read_message_frame(stream) {
            Ok(maybe_message) => {
                // Since we may get ping or close,
                // we need to reply to the messages even during read.
                // Thus we call write_pending() but ignore its blocking.
                self.write_pending(stream).no_block()?;

                // Thus if read blocks, just let it return WouldBlock.
                if let Some(message) = maybe_message {
                    trace!("Received message {}", message);
                    return Ok(message);
                }
            }
            Err(Error::Io(err))
                if err.kind() == IoErrorKind::WouldBlock && self.frame.is_empty() =>
            {
                // If there was nothing read and a WouldBlock was returned, avoid write.
                // This means the user call this function but the socket was not prepared to
                // read.
                return Err(Error::Io(err));
            }
            Err(e) => {
                self.write_pending(stream).no_block()?;
                return Err(e);
            }
        }
    }
}

frame.is_empty() would check if the in_buffer of the frame has any byte. If there are no bytes there (as I understood) and the stream.read returns a WouldBlock there is no need to call write (since it was a not necessary call to read_message(), maybe the intention was only to know if there were more messages).

daniel-abramov · 2021-04-22T17:56:13Z

Suppose the following:

Socker ready for reading.

Read a message calling read_message(). After that, the socket has no more data. But I do not know. The only way I have to know about it is by calling read_message() again.

Call read_message(). no bytes has been read, WouldBlock returned.

Ok, I see. And your concern is that in such case calling write_pending() in the last call is a waste of resources, right? However this does not mean that at the moment you call read_message() for the 2nd time the write_pending() won't do anything useful as your previous read attempt might have returned a message (breaking the loop that calls to write_pending()), so when you call the read_message() one more time to figure out if you have any new messages or not, write_pending() may write something to the socket at this point.

As a draft:

Ok, thanks for the code, that gives a bit more context. So the problem with the approach that you suggested is that we don't even have access to the frame at this point. frame is rather a low-level thing and we don't have access to it from the higher level module. But even if we did, the self.frame.is_empty() is not a good indicator of whether or not we read something, because the frame may contain some data, yet we can do multiple reads and still end up not assembling a single frame (even when some data is read from the socket), meaning that we may end up going inside the if err.kind() == IoErrorKind::WouldBlock && self.frame.is_empty() block even if we read nothing. Doing so would mean that until we assemble the whole message, we don't even try to write something to the socket which would be a problem if the user of the library does not write anything to the socket (i.e. does not call write_message() / write_pending() explicitly) but only reads from the socket since in such case we'll be waiting until the full message is assembled before even trying to reply to e.g. pings. That would unfortunately violate the invariants that we try to uphold.

lemunozm · 2021-04-22T18:30:53Z

And your concern is that in such case calling write_pending() in the last call is a waste of resources, right?

Exactly, this is the point :)

your previous read attempt might have returned a message (breaking the loop that calls to write_pending()), so when you call the read_message() one more time to figure out if you have any new messages or not, write_pending() may write something to the socket at this point.

If this is the case, the example I shared should manage this correctly, because when you read a message the method will do the call to write_pending() before returning the message.

even if we did, the self.frame.is_empty() is not a good indicator of whether or not we read something

Maybe I do not explain well what the hypothetical is_empty represents in this context. It represents that the buffer does not contain any bytes. Not that it does not contain a message. So when you say: "the frame may contain some data, yet we can do multiple reads and still end up not assembling a single frame", it will imply that self.frame.is_empty() will be false, and the behaviour will be as always, writing the pending data. The idea is that the condition if err.kind() == IoErrorKind::WouldBlock && self.frame.is_empty() could be true only if you do not read any byte and you return a Wouldblock.

Sorry if this question is taken so many messages 😅 . Only a last doubt or concern. Why did you decide a design for read_message() where reading from the stream itself is the last thing of the pipeline? I mean, Why don't you do "write_pending, prepare input buffer for read, read from stream" instead of "read from the stream, prepare input buffer for a next possible message, write_pending"?

daniel-abramov · 2021-04-23T10:00:58Z

Exactly, this is the point :)

Ok, so let's check if this is really the case i.e. if there is a safe way to drop this without unexpected side effect, see below.

If this is the case, the example I shared should manage this correctly, because when you read a message the method will do the call to write_pending() before returning the message.

Yes, but there is no guarantee that write_pending() will be able to fully write it to the socket from one attempt. It may write only the portion of data and return the WouldBlock, so we must make sure that upon next call to the read_message() the write_pending() is called.

Maybe I do not explain well what the hypothetical is_empty represents in this context. It represents that the buffer does not contain any bytes. Not that it does not contain a message. So when you say: "the frame may contain some data, yet we can do multiple reads and still end up not assembling a single frame", it will imply that self.frame.is_empty() will be false, and the behaviour will be as always, writing the pending data.

But the buffer will contain bytes even if we did not read any data (the InputBuffer will contain the data that we were able to gather so far). To implement this behavior we would need to expose some internals of the InputBuffer to the FrameCodec and create a function for that. While doing that would be possible, it will result in a code that suffers from one problem noted above with write_pending().

Only a last doubt or concern. Why did you decide a design for read_message() where reading from the stream itself is the last thing of the pipeline? I mean, Why don't you do "write_pending, prepare input buffer for read, read from stream" instead of "read from the stream, prepare input buffer for a next possible message, write_pending"?

Well for the sake of simplicity I would merge steps "prepare input buffer" and "read from stream" in one, because these are actually implementation details, i.e. we need to prepare some buffer to be able to read into it from the stream.

So the actual question is why do we do write_pending() -> read_message_frame()instead ofread_message_frame() -> write_pending()`? One of the reasons is that it allows us to concisely write that we first want to "flush" everything that is pending to be sent and then read and respond to the new data (would make more sense if we spoke about scheduling fairness). Another reason is that writing it the other way around would look really ugly in code and complicated in code, something like:

 // we can't return here on WouldBlock as it would skip us writing the pending messages
let read_res = read_message()

// now we write pending messages and could not return here on block, because that would mean skipping the `maybe_message` that could have been read completely
let write_res = write_pending();

// if there was a message, return it, if there was an error 
if let Some(mess) = read_res? {
    return Ok(mess);
}

write_res.no_block()?;

(and this is likely to work wrong as well, looks like a very brittle code for me)

lemunozm · 2021-04-26T07:02:21Z

Yes, but there is no guarantee that write_pending() will be able to fully write it to the socket from one attempt. It may write only the portion of data and return the WouldBlock, so we must make sure that upon next call to the read_message() the write_pending() is called.

This problem could happen either, you put the write_pending part first or last inside the read_message(). It is related to waking up for writing. In the current implementation:

read_message() {
    loop {
        write_pending().no_block() //not enough bytes, the pong is not sending yet.
        if Some(message) = read_message_frame()? {  //Return a successful message without send the previous pong.
            return Ok(message)
        }
    }
}

Anyway, I understand the complexities arising from reading and writing inside `read_message(), and maybe I can not expect this performance increment in the non-blocking case.

Thanks for your answers and time!

daniel-abramov · 2021-04-26T11:26:14Z

You're welcome! If the questions have been cleared, feel free to close the issue ;)

lemunozm mentioned this issue Apr 13, 2021

Reduced websocket latency lemunozm/message-io#73

Merged

daniel-abramov added the question label Apr 16, 2021

lemunozm closed this as completed Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Following an usual non-blocking pattern implementation adds extra time. #195

Following an usual non-blocking pattern implementation adds extra time. #195

lemunozm commented Apr 13, 2021 •

edited

Loading

daniel-abramov commented Apr 13, 2021

lemunozm commented Apr 14, 2021

daniel-abramov commented Apr 16, 2021 •

edited

Loading

lemunozm commented Apr 16, 2021 •

edited

Loading

daniel-abramov commented Apr 16, 2021 •

edited

Loading

lemunozm commented Apr 17, 2021 •

edited

Loading

daniel-abramov commented Apr 20, 2021

lemunozm commented Apr 21, 2021

daniel-abramov commented Apr 21, 2021

lemunozm commented Apr 21, 2021 •

edited

Loading

lemunozm commented Apr 21, 2021

lemunozm commented Apr 22, 2021 •

edited

Loading

daniel-abramov commented Apr 22, 2021 •

edited

Loading

lemunozm commented Apr 22, 2021

daniel-abramov commented Apr 23, 2021 •

edited

Loading

lemunozm commented Apr 26, 2021 •

edited

Loading

daniel-abramov commented Apr 26, 2021

Following an usual non-blocking pattern implementation adds extra time. #195

Following an usual non-blocking pattern implementation adds extra time. #195

Comments

lemunozm commented Apr 13, 2021 • edited Loading

daniel-abramov commented Apr 13, 2021

lemunozm commented Apr 14, 2021

daniel-abramov commented Apr 16, 2021 • edited Loading

lemunozm commented Apr 16, 2021 • edited Loading

daniel-abramov commented Apr 16, 2021 • edited Loading

lemunozm commented Apr 17, 2021 • edited Loading

daniel-abramov commented Apr 20, 2021

lemunozm commented Apr 21, 2021

daniel-abramov commented Apr 21, 2021

lemunozm commented Apr 21, 2021 • edited Loading

lemunozm commented Apr 21, 2021

lemunozm commented Apr 22, 2021 • edited Loading

daniel-abramov commented Apr 22, 2021 • edited Loading

lemunozm commented Apr 22, 2021

daniel-abramov commented Apr 23, 2021 • edited Loading

lemunozm commented Apr 26, 2021 • edited Loading

daniel-abramov commented Apr 26, 2021

lemunozm commented Apr 13, 2021 •

edited

Loading

daniel-abramov commented Apr 16, 2021 •

edited

Loading

lemunozm commented Apr 16, 2021 •

edited

Loading

daniel-abramov commented Apr 16, 2021 •

edited

Loading

lemunozm commented Apr 17, 2021 •

edited

Loading

lemunozm commented Apr 21, 2021 •

edited

Loading

lemunozm commented Apr 22, 2021 •

edited

Loading

daniel-abramov commented Apr 22, 2021 •

edited

Loading

daniel-abramov commented Apr 23, 2021 •

edited

Loading

lemunozm commented Apr 26, 2021 •

edited

Loading