-
Notifications
You must be signed in to change notification settings - Fork 3.1k
add multi-proc in to_csv
#2896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multi-proc in to_csv
#2896
Conversation
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! This is going in the right direction :)
I think it's better if we stick with the pandas CSV writer rather than Arrow's, for consistency with JSON but also because the pandas one may be more mature
|
I think you can just add a test |
|
Hi @lhoestq, |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good now, thanks @bhavitvyamalik @mariosasko :)
This PR extends the multi-proc method used in #2747 for
to_jsontoto_csvas well.Results on my machine post benchmarking on
ascent_kbdataset (giving ~45% improvement when compared to num_proc = 1):This is a WIP as writing tests is pending for this PR.
I'm also exploring this approach for which I'm using
pyarrow-5.0.0.