-
Notifications
You must be signed in to change notification settings - Fork 7k
[Data] - TPCH Q1 Release Test - Expr #58331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] - TPCH Q1 Release Test - Expr #58331
Conversation
Signed-off-by: Goutam <[email protected]>
Signed-off-by: Goutam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great improvement, replacing map_batches with with_column and Arrow expressions for the TPC-H Q1 benchmark. This change enhances both performance and readability, making the implementation much cleaner. I've identified a couple of opportunities to further improve the code by reusing newly created float columns, which will help avoid redundant computations and increase clarity.
Signed-off-by: Goutam <[email protected]>
Signed-off-by: Goutam <[email protected]>
|
|
||
| # Build float views + derived columns | ||
| ds = ( | ||
| ds.with_column("l_quantity_f", to_f64(col("column04"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to rename the columns first as our dataset has bogus column names
Signed-off-by: Goutam <[email protected]>
| .rename_columns( | ||
| { | ||
| "column00": "l_orderkey", | ||
| "column02": "l_suppkey", | ||
| "column03": "l_linenumber", | ||
| "column04": "l_quantity", | ||
| "column05": "l_extendedprice", | ||
| "column06": "l_discount", | ||
| "column07": "l_tax", | ||
| "column08": "l_returnflag", | ||
| "column09": "l_linestatus", | ||
| "column10": "l_shipdate", | ||
| "column11": "l_commitdate", | ||
| "column12": "l_receiptdate", | ||
| "column13": "l_shipinstruct", | ||
| "column14": "l_shipmode", | ||
| "column15": "l_comment", | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is gonna be broken https://anyscale1.atlassian.net/browse/DATA-1610
|
@goutamvenkat-anyscale how were u able to get around the empty partition? |
## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]>
## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]>
## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]>
## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Description
Replace
map_batchesand numpy invocations withwith_columnand arrow kernelsRelease test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043
Related issues
Additional information