Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the validity mask, data pointers, and the chunk size #254

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

taniabogatsch
Copy link
Collaborator

Fixes #253.

Significant performance improvement for scans.

Scan benchmark.

  • CREATE TABLE tbl (col1 STRUCT(value INTEGER));
  • Fill the table with 1M mock values.
  • We see huge improvements both for scanning nested data, and for scanning the integer directly.

PR timings.

SELECT col1.value FROM stress2.main.table2, took 32.725833ms
SELECT col1 FROM stress2.main.table2, took 192.932708ms
SELECT * FROM stress2.main.table2, took 198.002875ms

Additionally, the included benchmark for all types improved slightly.
BenchmarkTypes-10    	       5	 248644567 ns/op

Before.

SELECT col1.value FROM stress2.main.table2, took 236.263375ms
SELECT col1 FROM stress2.main.table2, took 466.651625ms
SELECT * FROM stress2.main.table2, took 460.474834ms

BenchmarkTypes-10    	       4	 286125906 ns/op

@taniabogatsch taniabogatsch merged commit 2da1412 into marcboeker:main Jul 25, 2024
4 checks passed
@taniabogatsch taniabogatsch deleted the caching branch July 25, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move caching during scans to improve performance
1 participant