Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse the Tokio Runtime #341

Merged
merged 1 commit into from
Apr 24, 2023

Conversation

kylebrooks-8451
Copy link
Contributor

Which issue does this PR close?

Closes #340

Rationale for this change

Currently, we create a new Tokio Runtime and associated threads often which is not good for performance. This PR uses a module level attribute to create this once and reuse it.

Are there any user-facing changes?

No.

@andygrove
Copy link
Member

Thanks @kylebrooks-8451. Can you share any info on how much difference this makes to performance? I am wondering if we should delay the 23.0.0 release to get this merged in?

@kylebrooks-8451
Copy link
Contributor Author

I wanted to get some numbers on this but I didn't have a great way to benchmark it. I know it is significant for our use case which is running an Arrow Flight Server using Datafusion as as engine but I don't have any hard numbers. Is there an easy way to benchmark the Python bindings? I see a benchmark suite for Datafusion proper.

@andygrove
Copy link
Member

I'm mostly testing with TPC-H using code here:

https://github.com/sql-benchmarks/sqlbench-runners/tree/main/datafusion-python

I doubt it will impact this benchmark all that much though.

This PR is fixing an ugly hack, so I think we should go ahead and merge this.

cc @jdye64

@andygrove andygrove merged commit 545e93e into apache:main Apr 24, 2023
@kylebrooks-8451 kylebrooks-8451 deleted the hotfix/reuse-tokio-runtime branch April 25, 2023 12:14
@kylebrooks-8451
Copy link
Contributor Author

@andygrove - I ran that benchmark you linked on my MacBook Pro 6 Core i7 2.6 GHz. Using the TPCH Parquet Data with a Scale Factor of 1.0 and the sqlbench-h SF=1 queries, I got a 245% or 2.45x speedup with the PR using release wheel builds. I'm glad this made it into the 23 release!

Before After
setup 246.6 26.7
q1 639.2 322.3
q2 616.3 198.2
q3 539.7 150.7
q4 408.1 107.5
q5 702.5 198.6
q6 125.3 50.7
q7 897.8 413.3
q8 868.4 237.5
q9 1265.6 348.2
q10 683.7 256.5
q11 245.4 105.1
q12 318.4 133
q13 1390.9 591
q14 195.8 88.7
q15 296.9 133.4
q16 269.4 106
q17 3291.9 1555
q18 2970.2 1060.1
q19 262.4 156.1
q20 668.9 370.5
q21 1018 624.7
total 17674.9 7207.1
Speedup 2.452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse the Tokio Async Runtime
2 participants