Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mem: implement ReadAll() for more efficient io.Reader consumption #7653

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ash2k
Copy link
Contributor

@ash2k ash2k commented Sep 20, 2024

I moved my project to gRPC 1.66.2 and saw a good reduction in RAM consumption. Now the hot spot is decompress(), where io.Copy() allocates a temporary buffer, reads from the reader into it, copies the read data into another buffer it got from the pool. This is an unnecessary allocation, an unnecessary copy, and underutilized buffers from the pool.

This PR adds mem.ReadAll() (like io.ReadAll()) to efficiently consume a reader into buffers from the pool.

Screenshot 2024-09-19 at 8 11 15 AM

I found #7631 while working on this code (I have similar code in my project, but decided to contribute it upstream and replace this io.Copy with it).

RELEASE NOTES:

  • mem: implement a ReadAll() method for more efficient io.Reader consumption

Copy link

codecov bot commented Sep 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.89%. Comparing base (a3a8657) to head (999f3d9).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7653      +/-   ##
==========================================
+ Coverage   81.87%   81.89%   +0.01%     
==========================================
  Files         373      373              
  Lines       37822    37857      +35     
==========================================
+ Hits        30967    31003      +36     
- Misses       5563     5564       +1     
+ Partials     1292     1290       -2     
Files with missing lines Coverage Δ
mem/buffer_slice.go 96.26% <100.00%> (+1.37%) ⬆️
rpc_util.go 79.48% <100.00%> (-0.06%) ⬇️

... and 25 files with indirect coverage changes

mem/buffer_slice.go Outdated Show resolved Hide resolved
mem/buffer_slice.go Outdated Show resolved Hide resolved
@purnesh42H purnesh42H added this to the 1.69 Release milestone Oct 16, 2024
@aranjans aranjans added the Type: Performance Performance improvements (CPU, network, memory, etc) label Oct 22, 2024
@purnesh42H
Copy link
Contributor

@ash2k are you working on this actively?

@ash2k
Copy link
Contributor Author

ash2k commented Oct 22, 2024

@purnesh42H No, I'm not. I think I've answered the questions and it's ready to be merged. Is something off still? Let me know what needs to be changed.

@ash2k
Copy link
Contributor Author

ash2k commented Oct 28, 2024

@PapaCharlie PTAL

mem/buffer_slice.go Show resolved Hide resolved
mem/buffer_slice.go Outdated Show resolved Hide resolved
mem/buffer_slice.go Outdated Show resolved Hide resolved
mem/buffer_slice_test.go Show resolved Hide resolved
mem/buffer_slice_test.go Show resolved Hide resolved
mem/buffer_slice_test.go Outdated Show resolved Hide resolved
mem/buffer_slice_test.go Outdated Show resolved Hide resolved
mem/buffer_slice_test.go Outdated Show resolved Hide resolved
mem/buffer_slice_test.go Show resolved Hide resolved
@easwars
Copy link
Contributor

easwars commented Oct 29, 2024

There are still a whole lot of comments which are not wrapped at 80-cols. Could you please take care of that.

Also, please don't mark comments as resolved. It is the responsibility of the person making the comment to mark it as resolved when they think that the comment has been sufficiently addressed.

@ash2k
Copy link
Contributor Author

ash2k commented Oct 29, 2024

@easwars

There are still a whole lot of comments which are not wrapped at 80-cols. Could you please take care of that.

Wrapped. Let me know if I missed something.

Also, please don't mark comments as resolved. It is the responsibility of the person making the comment to mark it as resolved when they think that the comment has been sufficiently addressed.

Ok, fair point. I used that as a way to track what I have addressed, but I see why that's not the best idea.

@ash2k
Copy link
Contributor Author

ash2k commented Oct 29, 2024

Related question: I see quite a few calls with a nil pool - mem.NewBuffer(&someDataSlice, nil). Why not swap this with a type cast like this mem.SliceBuffer(someDataSlice)? No need to call a function.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of the comments.

@easwars
Copy link
Contributor

easwars commented Oct 30, 2024

Related question: I see quite a few calls with a nil pool - mem.NewBuffer(&someDataSlice, nil). Why not swap this with a type cast like this mem.SliceBuffer(someDataSlice)? No need to call a function.

IIRC, the SliceBuffer type was added a little later on during the review process on the PR where all the buffering functionality was added, and there is a good chance some callsites were not fixed. I don't see a reason why we should be opposed to doing that change and would happy review a PR with that change. Thanks.

@easwars easwars changed the title mem: ReadAll for more efficient io.Reader consumption mem: implement ReadAll() for more efficient io.Reader consumption Oct 30, 2024
Comment on lines 241 to 242
wt, ok := r.(io.WriterTo)
if ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please combine into a compound if to limit the scope of wt, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +237 to +238
// A failed call returns a non-nil error and could return partially read
// buffers. It is the responsibility of the caller to free this buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising behavior. The one saving grace is that freeing buffers is optional -- GC will take them away if you forget. If not for that, I would say this is definitely not OK.

I highly doubt gRPC would ever want the partially data, and I'm curious why you want it, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior matches io.ReadAll(). Reasons to do it this way:

  • A drop-in replacement for io.ReadAll() (behavior-wise).
  • Sometimes you may need the read data regardless if there was an error or not.

Example of the last point: a proxy forwarding a response from an upstream server. It must send everything it got from the upstream and then return an error (return == maybe RST the connection or something else, protocol-specific). Isn't e.g. gRPC streaming response client the same? It reads and provides the client with messages even if it already got an error after the message. This is also similar to how HTTP/1.0 or 1.1 Connection: close works. Sometimes you don't want Transfer-Encoding: chunked and prefer the underlying connection to be closed on EOF instead of chunking.

}
if len(*buf) == usedCap {
result = append(result, NewBuffer(buf, pool))
break // grab a new buf from pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a label on 251 nextBuffer: and continue nextBuffer here instead? Then you don't even need the comment, and it's easier to find where the control flow is going.

Or, would it work if the inner loop is for len(*buf) < usedCap { instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads better, thanks.

@aranjans aranjans assigned dfawley and unassigned ash2k and easwars Nov 6, 2024
@easwars
Copy link
Contributor

easwars commented Nov 8, 2024

@ash2k
Could we have some benchmarks like what we have on #7786.

@ash2k
Copy link
Contributor Author

ash2k commented Nov 9, 2024

@easwars

This is current master vs this branch (just rebased on master to compare apples vs apples) with the benchmark from #7786.

goos: darwin
goarch: arm64
pkg: google.golang.org/grpc
                                              │   ./old.txt   │               ./new.txt               │
                                              │    sec/op     │    sec/op     vs base                 │
RPCCompressor/comp=gzip,payloadSize=1024-10      170.9µ ± ∞ ¹   148.7µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10     205.0µ ± ∞ ¹   179.2µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10    1.515m ± ∞ ¹   1.538m ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10     102.57µ ± ∞ ¹   76.36µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10    111.73µ ± ∞ ¹   84.42µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10    431.3µ ± ∞ ¹   413.9µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                          253.0µ         218.7µ        -13.56%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                              │   ./old.txt    │               ./new.txt                │
                                              │      B/op      │     B/op       vs base                 │
RPCCompressor/comp=gzip,payloadSize=1024-10     146.96Ki ± ∞ ¹   25.98Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10    185.43Ki ± ∞ ¹   43.04Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10   1103.0Ki ± ∞ ¹   994.4Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10      78.25Ki ± ∞ ¹   14.00Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10     89.34Ki ± ∞ ¹   23.77Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10   1137.5Ki ± ∞ ¹   986.3Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                          249.1Ki         84.54Ki        -66.07%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                              │  ./old.txt  │              ./new.txt              │
                                              │  allocs/op  │  allocs/op   vs base                │
RPCCompressor/comp=gzip,payloadSize=1024-10     252.0 ± ∞ ¹   244.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10    253.0 ± ∞ ¹   244.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10   294.0 ± ∞ ¹   288.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10     231.0 ± ∞ ¹   223.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10    231.0 ± ∞ ¹   223.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10   294.0 ± ∞ ¹   279.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                                         257.9         248.9        -3.47%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants