-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go][Parquet] Performance regression of writer since v14 #41541
Comments
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
May 13, 2024
zeroshade
added a commit
that referenced
this issue
May 15, 2024
### Rationale for this change A performance regression was reported for the parquet writer since v14. Profiling revealed excessive allocations. This was due to us always adding the current offset to the current capacity when reserving, resulting in Reserve always performing a reallocate even when it didn't need to. ### What changes are included in this PR? `PooledBufferWriter` should only pass `nbytes` to the `Reserve` call, not `byteoffset + nbytes`. `BitWriter` should not be adding `b.offset` to the capacity when determining the new capacity. ### Are these changes tested? Yes. ### Are there any user-facing changes? No, only performance changes: Before: ```shell goos: linux goarch: amd64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow cpu: 12th Gen Intel(R) Core(TM) i7-12700H BenchmarkWriteColumn/int32_not_nullable-20 514 2127175 ns/op 1971.77 MB/s 5425676 B/op 239 allocs/op BenchmarkWriteColumn/int32_nullable-20 31 467352621 ns/op 8.97 MB/s 2210271923 B/op2350 allocs/op BenchmarkWriteColumn/int64_not_nullable-20 326 4132204 ns/op 2030.06 MB/s 5442976 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-20 33 432764687 ns/op 19.38 MB/s 2100068812 B/op2384 allocs/op BenchmarkWriteColumn/float32_not_nullable-20 334 3540566 ns/op 1184.64 MB/s 5453079 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-20 6 492103646 ns/op 8.52 MB/s 2283305841 B/op3371 allocs/op BenchmarkWriteColumn/float64_not_nullable-20 241 4783268 ns/op 1753.74 MB/s 5498759 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-20 4 369619096 ns/op 22.70 MB/s 1725354454 B/op3401 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 40.862s ``` After: ```shell goos: linux goarch: amd64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow cpu: 12th Gen Intel(R) Core(TM) i7-12700H BenchmarkWriteColumn/int32_not_nullable-20 500 2136823 ns/op 1962.87 MB/s 5410591 B/op 240 allocs/op BenchmarkWriteColumn/int32_nullable-20 48 26604880 ns/op 157.65 MB/s 12053510 B/op 250 allocs/op BenchmarkWriteColumn/int64_not_nullable-20 340 3530509 ns/op 2376.03 MB/s 5439578 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-20 44 27387334 ns/op 306.30 MB/s 11870305 B/op 260 allocs/op BenchmarkWriteColumn/float32_not_nullable-20 316 3479312 ns/op 1205.50 MB/s 5456685 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-20 50 25910872 ns/op 161.87 MB/s 12054582 B/op 1271 allocs/op BenchmarkWriteColumn/float64_not_nullable-20 249 4769664 ns/op 1758.74 MB/s 5486020 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-20 51 25496256 ns/op 329.01 MB/s 12140753 B/op 1284 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 11.492s ``` All of the nullable column cases average around a 16x-17x performance improvement. * GitHub Issue: #41541 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Issue resolved by pull request 41638 |
vibhatha
pushed a commit
to vibhatha/arrow
that referenced
this issue
May 25, 2024
…he#41638) ### Rationale for this change A performance regression was reported for the parquet writer since v14. Profiling revealed excessive allocations. This was due to us always adding the current offset to the current capacity when reserving, resulting in Reserve always performing a reallocate even when it didn't need to. ### What changes are included in this PR? `PooledBufferWriter` should only pass `nbytes` to the `Reserve` call, not `byteoffset + nbytes`. `BitWriter` should not be adding `b.offset` to the capacity when determining the new capacity. ### Are these changes tested? Yes. ### Are there any user-facing changes? No, only performance changes: Before: ```shell goos: linux goarch: amd64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow cpu: 12th Gen Intel(R) Core(TM) i7-12700H BenchmarkWriteColumn/int32_not_nullable-20 514 2127175 ns/op 1971.77 MB/s 5425676 B/op 239 allocs/op BenchmarkWriteColumn/int32_nullable-20 31 467352621 ns/op 8.97 MB/s 2210271923 B/op2350 allocs/op BenchmarkWriteColumn/int64_not_nullable-20 326 4132204 ns/op 2030.06 MB/s 5442976 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-20 33 432764687 ns/op 19.38 MB/s 2100068812 B/op2384 allocs/op BenchmarkWriteColumn/float32_not_nullable-20 334 3540566 ns/op 1184.64 MB/s 5453079 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-20 6 492103646 ns/op 8.52 MB/s 2283305841 B/op3371 allocs/op BenchmarkWriteColumn/float64_not_nullable-20 241 4783268 ns/op 1753.74 MB/s 5498759 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-20 4 369619096 ns/op 22.70 MB/s 1725354454 B/op3401 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 40.862s ``` After: ```shell goos: linux goarch: amd64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow cpu: 12th Gen Intel(R) Core(TM) i7-12700H BenchmarkWriteColumn/int32_not_nullable-20 500 2136823 ns/op 1962.87 MB/s 5410591 B/op 240 allocs/op BenchmarkWriteColumn/int32_nullable-20 48 26604880 ns/op 157.65 MB/s 12053510 B/op 250 allocs/op BenchmarkWriteColumn/int64_not_nullable-20 340 3530509 ns/op 2376.03 MB/s 5439578 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-20 44 27387334 ns/op 306.30 MB/s 11870305 B/op 260 allocs/op BenchmarkWriteColumn/float32_not_nullable-20 316 3479312 ns/op 1205.50 MB/s 5456685 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-20 50 25910872 ns/op 161.87 MB/s 12054582 B/op 1271 allocs/op BenchmarkWriteColumn/float64_not_nullable-20 249 4769664 ns/op 1758.74 MB/s 5486020 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-20 51 25496256 ns/op 329.01 MB/s 12140753 B/op 1284 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 11.492s ``` All of the nullable column cases average around a 16x-17x performance improvement. * GitHub Issue: apache#41541 Authored-by: Matt Topol <[email protected]> Signed-off-by: Matt Topol <[email protected]>
zeroshade
pushed a commit
that referenced
this issue
Jul 10, 2024
…42003) ### Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR is complementary to #41638 . The prior PR reduces reallocations in `PooledBufferWriter`. However the problematic formula it addressed is still used in other functions. In addition to this, `(*PooledBufferWriter).Reserve()` simply doubles the capacity of buffers regardless of its argument `nbytes`. This may result in excessive allocations in some cases. ### What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Applied the fixed formula to `(*BufferWriter).Reserve()`. - Updated the new capacity passed to `(*memory.Buffer).Reserve()`. - Now using `bitutil.NextPowerOf2(b.pos + nbytes)` to avoid reallocations when adding `nbytes`. - Replaced `math.Max` with `utils.Max` in `(*bufferWriteSeeker).Reserve()` to avoid unnecessary type conversions. ### Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes. The following commands pass. ``` $ export PARQUET_TEST_DATA=$PWD/cpp/submodules/parquet-testing/data $ (cd go && go test ./...) ``` ### Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No, but it may reduce the number of allocations and improve the throughput. Before: ``` $ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/... goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow BenchmarkWriteColumn/int32_not_nullable-10 1190 1016705 ns/op 4125.39 MB/s 5443579 B/op 240 allocs/op BenchmarkWriteColumn/int32_nullable-10 52 24780561 ns/op 169.26 MB/s 12048944 B/op 249 allocs/op BenchmarkWriteColumn/int64_not_nullable-10 632 1717090 ns/op 4885.36 MB/s 5445954 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-10 51 22949770 ns/op 365.52 MB/s 12209860 B/op 262 allocs/op BenchmarkWriteColumn/float32_not_nullable-10 519 2234718 ns/op 1876.88 MB/s 5452627 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-10 56 23423793 ns/op 179.06 MB/s 12057540 B/op 1272 allocs/op BenchmarkWriteColumn/float64_not_nullable-10 416 2761247 ns/op 3037.98 MB/s 5507068 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-10 51 25767881 ns/op 325.55 MB/s 12059614 B/op 1285 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 10.592s ``` After: ``` $ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/... goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow BenchmarkWriteColumn/int32_not_nullable-10 1196 959528 ns/op 4371.22 MB/s 5420349 B/op 238 allocs/op BenchmarkWriteColumn/int32_nullable-10 51 23017598 ns/op 182.22 MB/s 14138480 B/op 248 allocs/op BenchmarkWriteColumn/int64_not_nullable-10 690 1671710 ns/op 5017.98 MB/s 5419878 B/op 263 allocs/op BenchmarkWriteColumn/int64_nullable-10 50 23196051 ns/op 361.64 MB/s 13728465 B/op 261 allocs/op BenchmarkWriteColumn/float32_not_nullable-10 540 2185075 ns/op 1919.52 MB/s 5459392 B/op 1261 allocs/op BenchmarkWriteColumn/float32_nullable-10 54 21796783 ns/op 192.43 MB/s 14150622 B/op 1271 allocs/op BenchmarkWriteColumn/float64_not_nullable-10 418 2708292 ns/op 3097.38 MB/s 5455095 B/op 1290 allocs/op BenchmarkWriteColumn/float64_nullable-10 51 22174952 ns/op 378.29 MB/s 14142791 B/op 1283 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 10.210s ``` <!-- If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking. --> <!-- **This PR includes breaking changes to public APIs.** --> <!-- Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious. --> <!-- **This PR contains a "Critical Fix".** --> * GitHub Issue: #41541
raulcd
pushed a commit
that referenced
this issue
Jul 11, 2024
…42003) ### Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> This PR is complementary to #41638 . The prior PR reduces reallocations in `PooledBufferWriter`. However the problematic formula it addressed is still used in other functions. In addition to this, `(*PooledBufferWriter).Reserve()` simply doubles the capacity of buffers regardless of its argument `nbytes`. This may result in excessive allocations in some cases. ### What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Applied the fixed formula to `(*BufferWriter).Reserve()`. - Updated the new capacity passed to `(*memory.Buffer).Reserve()`. - Now using `bitutil.NextPowerOf2(b.pos + nbytes)` to avoid reallocations when adding `nbytes`. - Replaced `math.Max` with `utils.Max` in `(*bufferWriteSeeker).Reserve()` to avoid unnecessary type conversions. ### Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes. The following commands pass. ``` $ export PARQUET_TEST_DATA=$PWD/cpp/submodules/parquet-testing/data $ (cd go && go test ./...) ``` ### Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No, but it may reduce the number of allocations and improve the throughput. Before: ``` $ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/... goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow BenchmarkWriteColumn/int32_not_nullable-10 1190 1016705 ns/op 4125.39 MB/s 5443579 B/op 240 allocs/op BenchmarkWriteColumn/int32_nullable-10 52 24780561 ns/op 169.26 MB/s 12048944 B/op 249 allocs/op BenchmarkWriteColumn/int64_not_nullable-10 632 1717090 ns/op 4885.36 MB/s 5445954 B/op 265 allocs/op BenchmarkWriteColumn/int64_nullable-10 51 22949770 ns/op 365.52 MB/s 12209860 B/op 262 allocs/op BenchmarkWriteColumn/float32_not_nullable-10 519 2234718 ns/op 1876.88 MB/s 5452627 B/op 1263 allocs/op BenchmarkWriteColumn/float32_nullable-10 56 23423793 ns/op 179.06 MB/s 12057540 B/op 1272 allocs/op BenchmarkWriteColumn/float64_not_nullable-10 416 2761247 ns/op 3037.98 MB/s 5507068 B/op 1292 allocs/op BenchmarkWriteColumn/float64_nullable-10 51 25767881 ns/op 325.55 MB/s 12059614 B/op 1285 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 10.592s ``` After: ``` $ go test -test.run='^$' -test.bench='^BenchmarkWriteColumn$' -benchmem ./parquet/pqarrow/... goos: linux goarch: arm64 pkg: github.com/apache/arrow/go/v17/parquet/pqarrow BenchmarkWriteColumn/int32_not_nullable-10 1196 959528 ns/op 4371.22 MB/s 5420349 B/op 238 allocs/op BenchmarkWriteColumn/int32_nullable-10 51 23017598 ns/op 182.22 MB/s 14138480 B/op 248 allocs/op BenchmarkWriteColumn/int64_not_nullable-10 690 1671710 ns/op 5017.98 MB/s 5419878 B/op 263 allocs/op BenchmarkWriteColumn/int64_nullable-10 50 23196051 ns/op 361.64 MB/s 13728465 B/op 261 allocs/op BenchmarkWriteColumn/float32_not_nullable-10 540 2185075 ns/op 1919.52 MB/s 5459392 B/op 1261 allocs/op BenchmarkWriteColumn/float32_nullable-10 54 21796783 ns/op 192.43 MB/s 14150622 B/op 1271 allocs/op BenchmarkWriteColumn/float64_not_nullable-10 418 2708292 ns/op 3097.38 MB/s 5455095 B/op 1290 allocs/op BenchmarkWriteColumn/float64_nullable-10 51 22174952 ns/op 378.29 MB/s 14142791 B/op 1283 allocs/op PASS ok github.com/apache/arrow/go/v17/parquet/pqarrow 10.210s ``` <!-- If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking. --> <!-- **This PR includes breaking changes to public APIs.** --> <!-- Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious. --> <!-- **This PR contains a "Critical Fix".** --> * GitHub Issue: #41541
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
It is not quite clear to me why it is the case yet, but by some try and error, the following single line:
https://github.com/apache/arrow/blob/main/go/parquet/internal/utils/bit_writer.go#L94
Introduced in v14 is causing as much as 10x slow down in some of my applications. I will try to reproduce the problem with some minimal working example. But in my experiment with real-world applications so far, it seems the use of
Reserve
here is incorrect.PooledBufferWriter
already does necessary reserve when it need toReserve
reserve additional bytes, not a new capacity, unlike in the usual C++ STL semantics.Component(s)
Go, Parquet
The text was updated successfully, but these errors were encountered: