Add spec for Socket#read_nonblock with a provided buffer#1145
Add spec for Socket#read_nonblock with a provided buffer#1145eregon merged 1 commit intoruby:masterfrom
Socket#read_nonblock with a provided buffer#1145Conversation
| IO.select([@r], nil, nil, 2) | ||
| @r.read_nonblock(5, buffer) | ||
| buffer.should == "aaa" | ||
| buffer.encoding.should == initial_encoding |
There was a problem hiding this comment.
Since the buffer is replaced by the contents of the read, ignoring the previous contents, it's a good question what should be the encoding, should it be the encoding of the read, or the original string encoding?
IMO using the original string encoding is wrong, the data returned there might not be valid in that encoding.
OTOH it clearly seems CRuby's behavior.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?
I could easily see the CRuby behavior causing bugs, e.g. appending binary data to a buffer declared as +"" which is UTF-8 and so causing extra needless computations e.g. for computing the coderange.
cc @ioquatix WDYT?
There was a problem hiding this comment.
Just to be clear: thank you for the PR and I'll merge this, but it seems good to have a bit of discussion if the semantics are desired or maybe accidental.
There was a problem hiding this comment.
I don't have an immediate strong opinion, except that using binary everywhere is the most predictable approach.
I wonder if the encoding should be set to whatever the encoding of the IO is. That would make sense to me, i.e. it would make sense to be consistent:
buffer1 = io.read_nonblock(1024)
buffer2 = String.new
io.read_nonblock(1024, buffer2)
buffer1.encoding == buffer2.encoding # ?
There was a problem hiding this comment.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?
AFAIK yes.
Personally I find this behavior desirable. If I know I'm using a protocol with a particular encoding, I don't want to have to force the encoding on every read.
As for the result being potentially broken, yes but that's on me to handle with a valid_encoding? check, and if needed I can use that as a signal that I need to wait for more data.
484e95c to
c78a7e9
Compare
|
I augmented the spec a bit. |
c78a7e9 to
23cd936
Compare
|
I also added a similar spec for |
The behavior of `IO#read_nonblock` differs sligthly. See: ruby/spec#1145
|
@casperisfine The CI fails on Windows for the new specs, could you exclude the problematic specs on Windows ( |
When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to `Encoding::BINARY`
23cd936 to
d7213cb
Compare
|
It seems whether the encoding is changed depend on arguments for |
|
Ref: ruby/ruby@0ca7036
|
When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to
Encoding::BINARYcc @eregon
Discovered as part of redis-rb/redis-client#184