Skip to content

Commit

Permalink
GH-41541: [Go][Parquet] Fix writer performance regression (#41638)
Browse files Browse the repository at this point in the history
### Rationale for this change
A performance regression was reported  for the parquet writer since v14. Profiling revealed excessive allocations. This was due to us always adding the current offset to the current capacity when reserving, resulting in Reserve always performing a reallocate even when it didn't need to.

### What changes are included in this PR?
`PooledBufferWriter` should only pass `nbytes` to the `Reserve` call, not `byteoffset + nbytes`. `BitWriter` should not be adding `b.offset` to the capacity when determining the new capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No, only performance changes:

Before:
```shell
goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     514	   2127175 ns/op	1971.77 MB/s	 5425676 B/op	     239 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      31	 467352621 ns/op	   8.97 MB/s	2210271923 B/op2350 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     326	   4132204 ns/op	2030.06 MB/s	 5442976 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      33	 432764687 ns/op	  19.38 MB/s	2100068812 B/op2384 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     334	   3540566 ns/op	1184.64 MB/s	 5453079 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	       6	 492103646 ns/op	   8.52 MB/s	2283305841 B/op3371 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     241	   4783268 ns/op	1753.74 MB/s	 5498759 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	       4	 369619096 ns/op	  22.70 MB/s	1725354454 B/op3401 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	40.862s
```

After:
```shell
goos: linux
goarch: amd64
pkg: github.com/apache/arrow/go/v17/parquet/pqarrow
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkWriteColumn/int32_not_nullable-20         	     500	   2136823 ns/op	1962.87 MB/s	 5410591 B/op	     240 allocs/op
BenchmarkWriteColumn/int32_nullable-20             	      48	  26604880 ns/op	 157.65 MB/s	12053510 B/op	     250 allocs/op
BenchmarkWriteColumn/int64_not_nullable-20         	     340	   3530509 ns/op	2376.03 MB/s	 5439578 B/op	     265 allocs/op
BenchmarkWriteColumn/int64_nullable-20             	      44	  27387334 ns/op	 306.30 MB/s	11870305 B/op	     260 allocs/op
BenchmarkWriteColumn/float32_not_nullable-20       	     316	   3479312 ns/op	1205.50 MB/s	 5456685 B/op	    1263 allocs/op
BenchmarkWriteColumn/float32_nullable-20           	      50	  25910872 ns/op	 161.87 MB/s	12054582 B/op	    1271 allocs/op
BenchmarkWriteColumn/float64_not_nullable-20       	     249	   4769664 ns/op	1758.74 MB/s	 5486020 B/op	    1292 allocs/op
BenchmarkWriteColumn/float64_nullable-20           	      51	  25496256 ns/op	 329.01 MB/s	12140753 B/op	    1284 allocs/op
PASS
ok  	github.com/apache/arrow/go/v17/parquet/pqarrow	11.492s
```

All of the nullable column cases average around a 16x-17x performance improvement.

* GitHub Issue: #41541

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
  • Loading branch information
zeroshade authored May 15, 2024
1 parent 63fddd7 commit e1de9c5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion go/parquet/internal/encoding/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ func (b *PooledBufferWriter) Reserve(nbytes int) {
b.buf = bufferPool.Get().(*memory.Buffer)
}

newCap := utils.Max(b.buf.Cap()+b.offset, 256)
newCap := utils.Max(b.buf.Cap(), 256)
for newCap < b.pos+nbytes {
newCap = bitutil.NextPowerOf2(newCap)
}
Expand Down

0 comments on commit e1de9c5

Please sign in to comment.