Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/lzw: add Resetter interface to allow encoders to be reused #26535

Closed
ericpauley opened this issue Jul 22, 2018 · 24 comments
Closed

compress/lzw: add Resetter interface to allow encoders to be reused #26535

ericpauley opened this issue Jul 22, 2018 · 24 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Milestone

Comments

@ericpauley
Copy link
Contributor

ericpauley commented Jul 22, 2018

compress/lzw encoders maintain a rather large table (64k) mapping uncompressed bytes to their compressed bits. Because the encoder requires a Close per stream they cannot be reused for multiple streams. This means that lzw creates 64k of garbage per use, which is especially apparent when encoding animation with image/gif.

While we can't easily change the API in go1 to support reuse, we can greatly reduce garbage by pooling the table parameter of encoders. I prototyped this behavior and got the following benchmarks:

name           old time/op    new time/op    delta
Encoder/1e4-8     108µs ± 1%     109µs ± 2%   +1.25%  (p=0.008 n=9+10)
Encoder/1e5-8    1.06ms ± 2%    1.11ms ± 2%   +4.15%  (p=0.000 n=10+9)
Encoder/1e6-8    10.5ms ± 2%    11.0ms ± 2%   +4.37%  (p=0.000 n=10+9)

name           old speed      new speed      delta
Encoder/1e4-8  93.0MB/s ± 1%  91.8MB/s ± 2%   -1.23%  (p=0.008 n=9+10)
Encoder/1e5-8  94.1MB/s ± 2%  90.4MB/s ± 2%   -3.98%  (p=0.000 n=10+9)
Encoder/1e6-8  95.2MB/s ± 2%  91.3MB/s ± 2%   -4.18%  (p=0.000 n=10+9)

name           old alloc/op   new alloc/op   delta
Encoder/1e4-8    77.9kB ± 0%     4.4kB ± 0%  -94.35%  (p=0.000 n=10+9)
Encoder/1e5-8    77.9kB ± 0%     4.4kB ± 0%  -94.33%  (p=0.000 n=10+10)
Encoder/1e6-8    77.9kB ± 0%     5.0kB ± 0%  -93.60%  (p=0.000 n=10+10)

name           old allocs/op  new allocs/op  delta
Encoder/1e4-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)
Encoder/1e5-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)
Encoder/1e6-8      3.00 ± 0%      4.00 ± 0%  +33.33%  (p=0.000 n=10+10)

(Note that these are on top of my existing CL https://go-review.googlesource.com/c/go/+/123478 that improves time/op significantly)

I don't see much use of pools in the standard library, is this not generally a good approach? Even if it is, is the memory overhead worth it for the 4% performance increase?

@dsnet
Copy link
Member

dsnet commented Jul 23, 2018

Pools have subtle properties that affect their performance (see #23199 and #22950). While #23199 isn't an issue here since the object is fixed size, #22950 can have adverse affect in all usages of pools in certain workloads.

That being said, why do you say:

While we can't easily change the API in go1 to support reuse

Is that because the API returns io.ReadCloser and io.WriteCloser? A similar problem existed for compress/flate and it was resolved by defining the flate.Resetter interface.

@bcmills bcmills added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 23, 2018
@bcmills bcmills added this to the Go1.12 milestone Jul 23, 2018
@bcmills
Copy link
Contributor

bcmills commented Jul 23, 2018

A pool seems like a poor fit for this use-case, since the lzw's ReadCloser and WriteCloser implementations are not documented to be safe for concurrent use anyway: a pool would add a contended cache line for an API that otherwise needs no cross-core synchronization.

A Resetter interface seems preferable.

@ericpauley
Copy link
Contributor Author

Thanks for the insight, a Resetter interface definitely seems like the way to go.

There's also the question of compress/lzw's main consumer, image/gif. As intended, this would reduce the allocation of multi-frame gif encoding significantly. (With a few internal modifications to persist the writer between calls to writeImageBlock. This wouldn't have any impact on the single-frame case, and wouldn't completely remove allocation from gif.EncodeAll.

One option would be for image/gif to have an lzw encoder Pool. Since all encoder use is internal there wouldn't be the same interface concerns as within lzw. Since there's no persistent encoder object there doesn't seem to be a good way to implement a Resetter interface like with lzw.

Maybe 64k garbage is okay for an image encoder, in which case the lzw change may be enough. (Personally I'd prefer that image/gif be near 0 quiescent allocation but that's for a niche use case)

@agnivade
Copy link
Contributor

Stumbled on this while looking at some heap profiles generated from using https://github.com/hashicorp/memberlist. Adding the Resetter interface works for the general use-case of the library. Will look into sending a CL.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/273667 mentions this issue: compress/lzw: add Reset method to allow encoders to be reused

@agnivade agnivade changed the title compress/lzw: reduce garbage in encoder proposal: compress/lzw: add Resetter interface to allow encoders to be reused Nov 29, 2020
@agnivade agnivade modified the milestones: Unplanned, Proposal Nov 29, 2020
@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

We clearly did a bad job with all the compress APIs. We should not have returned interfaces from the constructors. For example we got zlib.NewWriter correct - it returns a *zlib.Writer. But we got zlib.Reader wrong - it returns an io.ReadCloser. When we wanted to add a Reset method to the underlying reader, we had no way to expose it, so we added the zlib.Resetter interface with a guarantee that the resulting io.ReadCloser also implemented zlib.Resetter.

That's being proposed here for lzw, but on the Writer side, not the Reader side. It bothers me a bit that flate.Resetter and zlib.Resetter are both Read-side Resetters while this would be a Write-side Resetter. What happens when we want to Reset the lzw reader as well?

An alternative to defining these new interfaces would be to define the actual concrete type structs - *lzw.Reader and *lzw.Writer - and then document that the results from lzw.NewReader and lzw.NewWriter are guaranteed to be type-assert-able to those actual concrete types. Then any new methods we need could be added there without any new interfaces, and in particular without a different interface for reading and writing and a different interface for each new method.

Any thoughts about that alternative (defining the Reader and Writer structs for the package and documenting that the interfaces can be type-asserted to them, and then adding Reset as struct methods)?

@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

/cc @nigeltao

@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@ericpauley
Copy link
Contributor Author

For completeness, I can imagine two other (potentially poor) approaches:

  • Create new constructors that return the concrete structs directly. This has the advantage of no implicit type guarantees to be documented, though there is the obvious downside of an additional method.
  • Allow direct instantiation of the concrete structs (similar to bytes.Buffer). This removes the need for an additional method or type cast, but there is no way to allow setting fields at construction without making them mutable, which could lead to nonsensical states.

@rsc
Copy link
Contributor

rsc commented Jan 20, 2021

It's true that if we have structs that can be declared and initialized (by the Reset method) they would stand alone.
It seems like we should probably still define that the functions also return those implementations.
So the proposal would be:

// guarantee to return a *Reader
func NewReader(r io.Reader, order Order, litWidth int) io.ReadCloser

// guarantee to return a *Writer
func NewWriter(w io.Writer, order Order, litWidth int) io.WriteCloser

type Reader struct { ... unexported ... }
func (r *Reader) Reset(input io.Reader, order Order, litWidth int) 
func (r *Reader) Read([]byte) (int, error)
func (r *Reader) Close() error

type Writer struct { ... unexported ... }
func (w *Writer) Reset(output io.Writer, order Order, litWidth int) 
func (w *Writer) Write([]byte) (int, error)
func (w *Writer) Close() error

Any objections to this approach?

@dsnet
Copy link
Member

dsnet commented Jan 20, 2021

The asymmetry with compress/flate is unfortunate, but it's better than defining WriterRestter and ReaderResetter interface types.

@andig
Copy link
Contributor

andig commented Jan 20, 2021

@rsc why not return a *Writer and guarantee that it always fulfills WriteCloser? That would be compatible and usable without type assertion?

@ericpauley
Copy link
Contributor Author

@andig Would this technically violate the compatibility guarantee, since a function variable/argument typed for the current signature would no longer accept the new signature? E.g.:

var f func(r io.Reader, order lzw.Order, litWidth int) io.ReadCloser = lzw.NewReader

would break under this proposed type signature.

@andig
Copy link
Contributor

andig commented Jan 20, 2021

Guess it would 😣

@rsc
Copy link
Contributor

rsc commented Jan 27, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@ericpauley
Copy link
Contributor Author

While playing around with this I noticed that NewReader/NewWriter currently return a separate struct (errWriteCloser) on errors, which returns an error later when written/read. This doesn't necessarily require a change to the above APIs, but the separate struct will need to be incorporated into the Reader/Writer struct as an err struct member or similar, so that Reset can set errors on the struct (since it can't return this pseudo-reader/writer).

@agnivade
Copy link
Contributor

@rsc - Just wanted to clarify a small detail on the Reset methods: do we want to pass the order and litWidth to Reset or just let them be whatever they were when the Reader/Writer was created?

It seems like there is some inconsistency amongst the various Reset APIs on this one.

@dsnet
Copy link
Member

dsnet commented Jan 31, 2021

In my opinion, it was a mistake for gzip and flate to not take in more options. There have been times where I needed to reset them with different settings and was unable to. I think we should avoid the same mistake when adding reset functionality to lzw.

@ericpauley
Copy link
Contributor Author

To @dsnet's point, removing these parameters from Reset would also prohibit direct instantiation (then Reseting) the structs (unless these values had some default value)

@nigeltao
Copy link
Contributor

nigeltao commented Feb 2, 2021

Any objections to this approach?

LGTM.

@rsc
Copy link
Contributor

rsc commented Feb 3, 2021

To answer @agnivade's comment, it sounds like @dsnet and @nigeltao agree to include all the options in Reset, as in the comment above.

@rsc
Copy link
Contributor

rsc commented Feb 3, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: compress/lzw: add Resetter interface to allow encoders to be reused compress/lzw: add Resetter interface to allow encoders to be reused Feb 3, 2021
@rsc rsc modified the milestones: Proposal, Backlog Feb 3, 2021
@agnivade
Copy link
Contributor

agnivade commented Feb 3, 2021

I will update my CL to accomodate the changes in the proposal.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/323273 mentions this issue: doc/go1.17: clarify that compress/lzw Reader and Writer types are new

gopherbot pushed a commit that referenced this issue May 27, 2021
For #26535
For #44513
For #46005

Change-Id: I70d3711ab6451a61b526abb3da8e91243f637656
Reviewed-on: https://go-review.googlesource.com/c/go/+/323273
Trust: Ian Lance Taylor <[email protected]>
Reviewed-by: Joe Tsai <[email protected]>
@rsc rsc mentioned this issue Jun 10, 2021
83 tasks
@golang golang locked and limited conversation to collaborators May 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Projects
None yet
Development

No branches or pull requests

9 participants