-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: bytes: add Pool for buffer management #48535
Comments
see: #23199 |
They are different! The problem described in #23199 is about Pool for But in this issue, we talk about just a We get the For an example of an std solution, here are 1000k online connections, the qps/tps is not too large, maybe 10k. There are two |
Any pool for variable sized buffers needs to keep track of utilization and somehow make use of that metric to ensure that the way the buffers is used matches how they are stored. The proposed implementation does not do that. Have you seen #27735 (comment)? It takes into consideration utilization, but doesn't provide a |
About the trust of the client, we usually set a read-limited size which also limits the buffer size we want to get from the Pool.
This sounds right, but how can we know whether the client has sent 1GB? We can't know that before allocating the buffer and reading the data, so we still need to allocate the buffer first. If we read the data by several small buffers first then copy them to a 1GB buffer, it leads to another problem of performance. The limitation is necessary, otherwise, even if the client really sent 1GB or even 1TB, we still OOM. Although I think we should allocate the buffer first, I don't think allocating 1GB is right for most scenarios, setting a read-limited size or other types of limitations are right for most of the time. And for a specific scenario, we should not use a common buffer pool, but we customize it. |
BTW, neither a Pool with variable-sized buffers nor a Pool with an array of sync.Pools to bucketize the items by size could solve all scenarios, we should not ask for, and are not possible to gain a Pool that could be used anywhere. But it will help for most of the scenarios if std provides this commonly used type of Pool. |
Quoting #23199:
Outside of your very specific use case, what benefit is there to adding a new pool to the stdlib? |
@ericlagergren I have replied about #23199 here, please check the differences. My case is not |
That doesn’t really address my question, though.
This sort of pool (of objects without costly initialization) shouldn’t be necessary: it means there is a deficiency in the GC (for a particular use case, anyway).
Go’s compatibility promise means there is a high bar for adding new APIs to the stdlib. It’s not clear how this proposed API—which only works around a known, current deficiency in the GC—meets that bar.
… Le 24 sept. 2021 à 18:57, lesismal ***@***.***> a écrit :
// Put puts a slice back to the pool.
// If the slice's cap is smaller than MinSize,
// it will not be put back to the pool but dropped.
see: #23199
They are different!
The problem described in #23199 is about Pool for bytes.Buffer which is usually bound to an alive object, such as a Connection, and the bytes.Buffer is usually held for a long time by the object which means it could not be reused when it was alive with the object.
But in this issue, we talk about just a []byte, that would not or should not be bound to any Object that has a long life cycle. For example, in nbio, every body/buffer is held by a message, we got it from the Pool when we start parse a new message, we release it and put it back to the pool soon after we handled the message, it was not bound to the Connection and was not held for long.
Or for protobuf, we usually Mashal a protobuf Message to a buffer, then send the buffer, then we can put the buffer back to the Pool.
We get the []byte from the Pool, and put it back to the Pool soon, that means we could promote the reuse rate of the buffers, then, the buffers in the Pool keep a small num.
For an example of an std solution, here are 1000k online connections, the qps/tps is not too large, maybe 10k. There are two bytes.Buffer for each connection(one for read, the other for write), then it cost us 2000k bytes.Buffer.
But for the same 1000k online connections and the same qps/tps of 10k, we need one []byte for each message we are processing or in the processing queue, and the num should be smaller than 10k.
@ericlagergren I have replied about #23199 here, please check the differences.
My case is not very specific, lots of cases that use buffers with a short life cycle could benefit from a common buffer pool.
Of course, we can customize more details such as #23199 (comment).
And at least, we could provide a config field to control a pkg to use the std's pool or itself's pool.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
For an std-based HTTP service like one-connection-one-goroutine, the allocating of buffers which is not too big size is goroutine-stack-friendly most of the time, and buffer for reading can be reused in a loop: var buffer []byte = ...
for {
request, err := readRequest(conn, buffer...)
if err {
...
break
}
handleRequest(request)
} But for more programs(maybe pushing, game, IM services), buffers are used between different goroutines, which makes the buffers not goroutine-stack-friendly and can't be reused that easily. Then it becomes necessary. |
Why does this need to be in the standard library? Why can't it be done in a third-party package? |
This proposal has been added to the active column of the proposals project |
When a project depends on multiple libs, the libs of these different authors do not use the same third-party lib, and the project cannot use the same buffer pool together, but if the std provides a buffer pool, different authors will be more inclined to use the std. |
A global byte pool seems pretty dangerous anyway. It would probably be better to follow sync.Pool and have individual pools. At that point it doesn't matter whether different authors use different pools. We definitely don't want to encourage everyone to start using a global []byte pool instead of make and relying on the GC for cleanup. That will just lead to tons of dangling pointer errors. A pool is better to use in the few places where it really matters and you can be extra careful to get it right. |
That's all right, I'll close this issue. |
This proposal has been declined as retracted. |
Background
Unlike other object type pools used for different repositories themselves, the
[]byte
pool is a common need of many repositories.Many repositories or packages with this need are still customized themselves or use buffers without a pool, such as:
https://github.com/golang/go/blob/master/src/crypto/tls/conn.go#L930
Or here if we use a pool may be better:
https://github.com/golang/protobuf/blob/master/proto/wire.go#L23
I am now working on a non-blocking lib nbio that aims to solve 1000k problems which is still a difficult job for
net/http
.nbio is event-driven and allowed users to customize their own size-limited goroutine pool to handle messages of connections, instead of serving each connection with at least one goroutine, then we save lots of goroutines, reduce lots of memory usage and avoid stw.
nbio has supported non-blocking tls/http1.x/websocket and has mostly achieved the results. Here is some load test compared with std: #15735 (comment).
There were some difficulties I've met during the implementation of these non-blocking-io and streaming-parser, especially the buffer's life cycle management.
Unlike std-based frameworks in which the buffer can be easily reused in a for loop in the same goroutine, the buffers in nbio are passed between different goroutines and different protocol layers(tcp/tls/http/websocket).
To reuse the buffers and reduce the memory usage as much as possible, I manage all the buffers by the same
[]byte
pool, although I've already achieved it, but not convenient.I think it's much better if std provides such a
[]byte
pool, and many different pkg will be able to use the pool together which could promote the reuse rate and reduce more memory usage and gc.Possible Implementation
I prefer to add a
[]byte
pool inbytes
like this:The text was updated successfully, but these errors were encountered: