Add spec for `Socket#read_nonblock` with a provided buffer by casperisfine · Pull Request #1145 · ruby/spec

casperisfine · 2024-03-22T13:58:34Z

When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to Encoding::BINARY

Discovered as part of redis-rb/redis-client#184

library/socket/basicsocket/read_nonblock_spec.rb

eregon · 2024-03-22T20:42:20Z

library/socket/basicsocket/read_nonblock_spec.rb

+      IO.select([@r], nil, nil, 2)
+      @r.read_nonblock(5, buffer)
+      buffer.should == "aaa"
+      buffer.encoding.should == initial_encoding


Since the buffer is replaced by the contents of the read, ignoring the previous contents, it's a good question what should be the encoding, should it be the encoding of the read, or the original string encoding?

IMO using the original string encoding is wrong, the data returned there might not be valid in that encoding.
OTOH it clearly seems CRuby's behavior.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?

I could easily see the CRuby behavior causing bugs, e.g. appending binary data to a buffer declared as +"" which is UTF-8 and so causing extra needless computations e.g. for computing the coderange.

cc @ioquatix WDYT?

Just to be clear: thank you for the PR and I'll merge this, but it seems good to have a bit of discussion if the semantics are desired or maybe accidental.

I don't have an immediate strong opinion, except that using binary everywhere is the most predictable approach.

I wonder if the encoding should be set to whatever the encoding of the IO is. That would make sense to me, i.e. it would make sense to be consistent:

buffer1 = io.read_nonblock(1024) buffer2 = String.new io.read_nonblock(1024, buffer2) buffer1.encoding == buffer2.encoding # ?

Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?

AFAIK yes.

Personally I find this behavior desirable. If I know I'm using a protocol with a particular encoding, I don't want to have to force the encoding on every read.

As for the result being potentially broken, yes but that's on me to handle with a valid_encoding? check, and if needed I can use that as a signal that I need to wait for more data.

casperisfine · 2024-03-23T08:27:23Z

I augmented the spec a bit.

casperisfine · 2024-03-23T08:36:10Z

I also added a similar spec for #read and amusingly TruffleRuby's behavior match MRI's on read, just not on read_nonblock.

The behavior of `IO#read_nonblock` differs sligthly. See: ruby/spec#1145

eregon

Thank you for the new specs!

eregon · 2024-03-25T11:05:18Z

@casperisfine The CI fails on Windows for the new specs, could you exclude the problematic specs on Windows (platform_is_not :windows)?

When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to `Encoding::BINARY`

eregon · 2024-04-09T12:33:13Z

It seems whether the encoding is changed depend on arguments for read, see https://bugs.ruby-lang.org/issues/20416

casperisfine · 2024-04-09T13:13:30Z

Ref: ruby/ruby@0ca7036

io.c (read_all): should associate default external encoding.

io.c (io_read): should NOT associate default external encoding.

casperisfine commented Mar 22, 2024

View reviewed changes

library/socket/basicsocket/read_nonblock_spec.rb Show resolved Hide resolved

casperisfine mentioned this pull request Mar 22, 2024

Various Ruby driver optimizations redis-rb/redis-client#184

Merged

eregon reviewed Mar 22, 2024

View reviewed changes

casperisfine force-pushed the socket-read-partial-encoding branch from 484e95c to c78a7e9 Compare March 23, 2024 08:26

casperisfine force-pushed the socket-read-partial-encoding branch from c78a7e9 to 23cd936 Compare March 23, 2024 08:35

casperisfine pushed a commit to redis-rb/redis-client that referenced this pull request Mar 23, 2024

Fix compatibility with TruffleRuby

a82593c

The behavior of `IO#read_nonblock` differs sligthly. See: ruby/spec#1145

eregon approved these changes Mar 25, 2024

View reviewed changes

Add spec for Socket#read_nonblock with a provided buffer

d7213cb

When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to `Encoding::BINARY`

casperisfine force-pushed the socket-read-partial-encoding branch from 23cd936 to d7213cb Compare March 25, 2024 11:22

eregon merged commit c8ab292 into ruby:master Mar 25, 2024

eregon mentioned this pull request Mar 25, 2024

The buffer encoding should remain unchanged after read_nonblock(N, buffer) truffleruby/truffleruby#3506

Closed

byroot mentioned this pull request Apr 12, 2024

Fix BufferedIO to search with byteindex redis-rb/redis-client#189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add spec for `Socket#read_nonblock` with a provided buffer#1145

Add spec for `Socket#read_nonblock` with a provided buffer#1145
eregon merged 1 commit intoruby:masterfrom
casperisfine:socket-read-partial-encoding

casperisfine commented Mar 22, 2024

Uh oh!

Uh oh!

eregon Mar 22, 2024 •

edited

Loading

Uh oh!

eregon Mar 22, 2024

Uh oh!

ioquatix Mar 23, 2024

Uh oh!

casperisfine Mar 23, 2024

Uh oh!

casperisfine commented Mar 23, 2024

Uh oh!

casperisfine commented Mar 23, 2024

Uh oh!

eregon left a comment

Uh oh!

eregon commented Mar 25, 2024

Uh oh!

eregon commented Apr 9, 2024

Uh oh!

casperisfine commented Apr 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

casperisfine commented Mar 22, 2024

Uh oh!

Uh oh!

eregon Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eregon Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

ioquatix Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

casperisfine Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

casperisfine commented Mar 23, 2024

Uh oh!

casperisfine commented Mar 23, 2024

Uh oh!

eregon left a comment

Choose a reason for hiding this comment

Uh oh!

eregon commented Mar 25, 2024

Uh oh!

eregon commented Apr 9, 2024

Uh oh!

casperisfine commented Apr 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eregon Mar 22, 2024 •

edited

Loading