feat(rust,python): Add GPU support to the LazyFrame profiler #20693

Matt711 · 2025-01-13T18:19:03Z

Closes #20039

…port-gpu-lf-profiler

crates/polars-mem-engine/src/executors/scan/python_scan.rs

codecov · 2025-02-12T22:50:21Z

Codecov Report

Attention: Patch coverage is 36.84211% with 60 lines in your changes missing coverage. Please review.

Project coverage is 79.98%. Comparing base (45ec22a) to head (0be4de8).

Files with missing lines	Patch %	Lines
crates/polars-python/src/lazyframe/general.rs	33.33%	22 Missing ⚠️
...olars-mem-engine/src/executors/scan/python_scan.rs	46.87%	17 Missing ⚠️
crates/polars-lazy/src/frame/mod.rs	0.00%	10 Missing ⚠️
py-polars/polars/lazyframe/frame.py	42.85%	6 Missing and 2 partials ⚠️
crates/polars-expr/src/state/node_timer.rs	50.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #20693      +/-   ##
==========================================
- Coverage   79.99%   79.98%   -0.02%     
==========================================
  Files        1598     1598              
  Lines      229199   229285      +86     
  Branches     2620     2623       +3     
==========================================
+ Hits       183352   183387      +35     
- Misses      45248    45297      +49     
- Partials      599      601       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…port-gpu-lf-profiler

ritchie46 · 2025-02-23T08:05:15Z

crates/polars-python/src/lazyframe/general.rs

+    ) -> PyResult<(PyDataFrame, PyDataFrame)> {
+        // if we don't allow threads and we have udfs trying to acquire the gil from different
+        // threads we deadlock.
+        let (df, time_df) = py.allow_threads(|| {


We should use the enter_polars which handles the allow_threads.

ritchie46 · 2025-02-23T08:07:46Z

py-polars/polars/lazyframe/frame.py

@@ -1706,6 +1708,30 @@ def profile(
         │ group_by_partitioned(a) ┆ 5     ┆ 470  │
         │ sort(a)                 ┆ 475   ┆ 1964 │
         └─────────────────────────┴───────┴──────┘)
+        >>> lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort("a").profile(


Nit: Can we make this query multiline? Something like:

( lf.group_by("a", maintain_order=True) .agg( pl.all().sum() ).sort("a") .profile )

ritchie46 · 2025-02-23T08:11:24Z

crates/polars-python/src/lazyframe/general.rs

+            let ldf = self.ldf.clone();
+            if let Some(lambda) = lambda_post_opt {
+                ldf._profile_post_opt(|root, lp_arena, expr_arena| {
+                    Python::with_gil(|py| {


I believe this code is exactly the same as in collect. Can we put it in a function?

ritchie46 · 2025-02-23T08:18:56Z

crates/polars-mem-engine/src/executors/scan/python_scan.rs

@@ -14,6 +15,39 @@ pub(crate) struct PythonScanExec {
    pub(crate) predicate_serialized: Option<Vec<u8>>,
 }

+#[pyclass]
+pub struct PyNodeTimer {


I don't think it is needed to leak those internals. See comment: crates/polars-mem-engine/src/executors/scan/python_scan.rs

ritchie46 · 2025-02-23T08:21:42Z

crates/polars-mem-engine/src/executors/scan/python_scan.rs

-            ) {
+            let generator_init = if matches!(self.options.python_source, PythonScanSource::Cuda) {
+                let py_node_timer = PyNodeTimer::new(state.node_timer.clone());
+                let args = (


Instead of leaking our timer nodes which, I really don't like as now we cannot refactor as CuDF is dependent on internals other than our DSL.

Instead the python callable could accept an argument profile: bool and if set return a list of timing tuples:

[(operation: str,, start: time, end: time)]

We can then unpack those tuples here and update the NodeTime accordingly.

Thanks for the review @ritchie46. There might be a casting problem with this approach. Is there a way to convert a u64 to a std::time::Instant?

Instead of storing the Instants we could convert them to u64 immediately in the store method.

Once I time the node in cuDF, I'll get two u64 values. The store method of NodeTimer takes Instants. I dont think there's a way to convert a u64 to Instant. Is that right?

We can bypass the store method for GPU. And then store u64s instead of Instants.

I'm giving your suggestion a try @ritchie46

How would I unpack the return call? I have some thing like

let args = ( python_scan_function, with_columns.map(|x| x.into_iter().map(|x| x.to_string()).collect::<Vec<_>>()), predicate, n_rows, true, ); callable.call1(args).map_err(to_compute_err) <--- returns tuple[DataFrame, tuple[str, int, int]] in python

How would I bypass the store method which relies upon std::time::Instants?

You can use pyo3 to access the items of a tuple and then extract the proper types. (See later in this file how we can get the DataFrame).

How would I bypass the store method which relies upon std::time::Instants?

Make a store_raw method.

Update: I'm able to extract the items from the PyTuple returned from the python callback. I also added the raw_store to NodeTimer. Here's what profiling with the GPU looks like so far:

In [1]: import polars as pl In [2]: lf = pl.LazyFrame( ...: ...: { ...: ...: "a": ["a", "b", "a", "b", "b", "c"], ...: ...: "b": [1, 2, 3, 4, 5, 6], ...: ...: "c": [6, 5, 4, 3, 2, 1], ...: ...: } ...: ...: ) ...: ...: lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort( ...: ...: "a" ...: ...: ).profile(engine="gpu") Out[2]: (shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ a ┆ 4 ┆ 10 │ │ b ┆ 11 ┆ 10 │ │ c ┆ 6 ┆ 1 │ └─────┴─────┴─────┘, shape: (2, 3) ┌──────────────┬──────────────────┬──────────────────┐ │ node ┆ start ┆ end │ │ --- ┆ --- ┆ --- │ │ str ┆ u64 ┆ u64 │ ╞══════════════╪══════════════════╪══════════════════╡ │ optimization ┆ 0 ┆ 9591759057601683 │ │ sort ┆ 9591759057601683 ┆ 9591759094456985 │ └──────────────┴──────────────────┴──────────────────┘) ***The times are in nanoseconds not microseconds

There are a couple of problems with the result that I'm encountering.

The NodeTimer keeps a query_start attribute (a std::time::Instant) which I cannot use to adjust all of the raw u64 times. Do you have any ideas on capturing the start time and having it available in NodeTimer::finalize?

If you look at the output above, only the final sort operation is being captured. There's probably a copy happening somewhere so that I'm not updating the same NodeTimer object each time I call store_raw. I'm surprised because the execution state is mutable. Do I need some sort of global NodeTimer to resolve this?

Tried an alternative approach in #21534.

…port-gpu-lf-profiler

Matt711 · 2025-03-03T17:37:18Z

Closing in favor of #21534. Thanks for your help on this @ritchie46!

Add GPU support to the LazyFrame profiler

4a0d5ec

github-actions bot added the title needs formatting label Jan 13, 2025

Matt711 mentioned this pull request Jan 13, 2025

Add GPU support to the Polars LazyFrame profiler rapidsai/cudf#17723

Closed

3 tasks

clean up

412cf09

Matt711 changed the title ~~[WIP] Add GPU support to the LazyFrame profiler~~ feat: Add GPU support to the LazyFrame profiler Jan 14, 2025

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Jan 14, 2025

Matt711 changed the title ~~feat: Add GPU support to the LazyFrame profiler~~ feat(rust,python): Add GPU support to the LazyFrame profiler Jan 16, 2025

Matt711 added 4 commits January 23, 2025 15:44

Merge branch 'main' of https://github.com/pola-rs/polars into fea/sup…

11812bd

…port-gpu-lf-profiler

get the time right

e381afe

merge conflict

0f08810

clean up

d5315db

wence- reviewed Feb 10, 2025

View reviewed changes

crates/polars-mem-engine/src/executors/scan/python_scan.rs Outdated Show resolved Hide resolved

Matt711 and others added 6 commits February 10, 2025 15:12

time cudf_polars IR from rust rather than python

92d51ca

time cudf_polars IR from rust rather than python

383cb58

Merge branch 'pola-rs:main' into fea/support-gpu-lf-profiler

a58bf67

clean up

15103a5

seperate pyarrow and cuda python scan sources

20eeab9

fix test

74a206c

Matt711 marked this pull request as ready for review February 13, 2025 01:09

Matt711 requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners February 13, 2025 01:09

Merge branch 'pola-rs:main' into fea/support-gpu-lf-profiler

406fe66

Matt711 added 3 commits February 20, 2025 14:29

Merge branch 'main' of https://github.com/pola-rs/polars into fea/sup…

534835a

…port-gpu-lf-profiler

update docs

1e67ff1

lint

d34024f

Matt711 requested a review from wence- February 20, 2025 16:36

ritchie46 requested changes Feb 23, 2025

View reviewed changes

Merge branch 'main' of https://github.com/pola-rs/polars into fea/sup…

0be4de8

…port-gpu-lf-profiler

vyasr added this to cuDF Python Feb 26, 2025

vyasr moved this to Todo in cuDF Python Feb 26, 2025

Matt711 moved this from Todo to In Progress in cuDF Python Feb 26, 2025

Matt711 added 2 commits February 27, 2025 19:46

address review

cbb91ef

address review

014e38c

wence- mentioned this pull request Feb 28, 2025

feat(python,rust): Support engine callback for LazyFrame.profile #21534

Merged

Matt711 closed this Mar 3, 2025

github-project-automation bot moved this from In Progress to Done in cuDF Python Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust,python): Add GPU support to the LazyFrame profiler #20693

feat(rust,python): Add GPU support to the LazyFrame profiler #20693

Matt711 commented Jan 13, 2025

codecov bot commented Feb 12, 2025 •

edited

Loading

ritchie46 Feb 23, 2025

ritchie46 Feb 23, 2025

ritchie46 Feb 23, 2025

ritchie46 Feb 23, 2025

ritchie46 Feb 23, 2025

Matt711 Feb 23, 2025

ritchie46 Feb 24, 2025

Matt711 Feb 24, 2025

ritchie46 Feb 24, 2025

Matt711 Feb 27, 2025

ritchie46 Feb 27, 2025

Matt711 Feb 27, 2025

wence- Feb 28, 2025

Matt711 commented Mar 3, 2025 •

edited

Loading

feat(rust,python): Add GPU support to the LazyFrame profiler #20693

feat(rust,python): Add GPU support to the LazyFrame profiler #20693

Conversation

Matt711 commented Jan 13, 2025

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Matt711 commented Mar 3, 2025 • edited Loading

codecov bot commented Feb 12, 2025 •

edited

Loading

Matt711 commented Mar 3, 2025 •

edited

Loading