Skip to content

Conversation

plotor
Copy link
Contributor

@plotor plotor commented Jul 21, 2025

Changes Made

We found that in some scenarios, users need to obtain the Runner type of Daft in UDF, but currently it can only be obtained through daft.context.get_context()._runner.name. The problem is that the UDF running on the ray worker gets None result when call daft.context.get_context()._runner, so I added a daft.context.get_context().get_runner_type() method in this PR. The execution mechanism of this method is as follows:

  1. Prioritize daft.context.get_context()._runner to determine the Runner type;

  2. If daft.context.get_context()._runner is None, call the detect_ray_state method to determine whether it's currently running on ray. If so, the current Runner type is considered to be ray, otherwise it is native.

In addition, I found that when the DAFT_RUNNER env is inconsistent with set_runner_xxx, Daft will prioritize the set_runner_xxx settings, so I added some warn logs to remind users.

Related Issues

No issue

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the fix label Jul 21, 2025
Copy link

codecov bot commented Jul 21, 2025

Codecov Report

❌ Patch coverage is 50.94340% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.21%. Comparing base (296a129) to head (79721be).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-context/src/python.rs 0.00% 12 Missing ⚠️
src/daft-context/src/lib.rs 66.66% 11 Missing ⚠️
src/daft-py-runners/src/lib.rs 0.00% 2 Missing ⚠️
daft/context.py 75.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4810      +/-   ##
==========================================
+ Coverage   78.81%   79.21%   +0.40%     
==========================================
  Files         893      893              
  Lines      124507   124159     -348     
==========================================
+ Hits        98128    98357     +229     
+ Misses      26379    25802     -577     
Files with missing lines Coverage Δ
daft/utils.py 90.16% <100.00%> (ø)
daft/context.py 81.57% <75.00%> (-0.37%) ⬇️
src/daft-py-runners/src/lib.rs 81.55% <0.00%> (ø)
src/daft-context/src/lib.rs 77.39% <66.66%> (-2.42%) ⬇️
src/daft-context/src/python.rs 62.26% <0.00%> (-7.95%) ⬇️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch 2 times, most recently from c8e282c to 5a176e7 Compare July 23, 2025 10:04
@plotor plotor changed the title fix: Getting the Runner type in UDF return None feat: Support get runner type in context by get_runner_type Jul 23, 2025
@github-actions github-actions bot added feat and removed fix labels Jul 23, 2025
@plotor plotor marked this pull request as ready for review July 23, 2025 10:05
@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch from 5a176e7 to aa012c3 Compare July 23, 2025 10:49
@plotor plotor changed the title feat: Support get runner type in context by get_runner_type feat: Add get_runner_type method to support getting the currently used Runner type Jul 23, 2025
@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch from aa012c3 to 713779f Compare July 24, 2025 02:02
Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i'm a little worried that this API can cause confusion to users, given that the results of get_runner_type can easily be changed based on order of operations and execution environment. I think it would be more appropriate for this to be called get_or_infer_runner. Additionally, I'd also prefer if this was a standalone function instead of a method on DaftContext, to keep it simple.

In order to know which runner is executing the UDF though, it will be more foolproof to pass this information into the UDF itself. This would be of larger scope and needs some design.

Lastly, could you please add tests for this in tests/test_context.py.

@@ -45,6 +46,24 @@ impl PyDaftContext {
}
}
}

pub fn get_runner_type(&self, py: Python) -> PyResult<PyObject> {
let runner_type = self.inner.runner().map_or_else(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should call get_runner_config_from_env to check the env as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why get_runner_config_from_env needs to be called here, because the setting of DAFT_RUNNER env may be inconsistent with set_runner_xxx, and the latter has a higher priority, so the result obtained by get_runner_config_from_env may not be accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_runner_type() can also be inaccurate if it is called before and after set_runner_xxx, however if there is no set_runner_xxx, it can also be inaccurate due to DAFT_RUNNER env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, because native is the default runner of daft, if get_runner_config_from_env is not called, then get_runner_type will always return native before and after set_runner_ray is not called. Now the so-called "inaccurate" will only occur when the following two conditions are met:

  1. get_runner_type is called before set_runner_xxx.
  2. set_runner_xxx is inconsistent with DAFT_RUNNER env settings.

But now when set_runner_xxx is inconsistent with DAFT_RUNNER env settings, a warn log will be printed to remind the user, so get_runner_type will not confuse the user.

Good suggestions, thanks.

@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch 3 times, most recently from ea9988b to 3823999 Compare July 28, 2025 09:52
@plotor
Copy link
Contributor Author

plotor commented Jul 28, 2025

So i'm a little worried that this API can cause confusion to users, given that the results of get_runner_type can easily be changed based on order of operations and execution environment. I think it would be more appropriate for this to be called get_or_infer_runner. Additionally, I'd also prefer if this was a standalone function instead of a method on DaftContext, to keep it simple.

In order to know which runner is executing the UDF though, it will be more foolproof to pass this information into the UDF itself. This would be of larger scope and needs some design.

Lastly, could you please add tests for this in tests/test_context.py.

Thanks for taking the time to review. The tests have been added to test_context.py, please review it again.

Copy link
Contributor

@colin-ho colin-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing, otherwise it looks good

daft/context.py Outdated
@@ -47,6 +47,21 @@ def __init__(self, ctx: PyDaftContext | None = None):
else:
self._ctx = PyDaftContext()

def get_runner_type(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to get_or_infer_runner_type to make it clearer that this will not create the runner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, and modified

@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch from 3823999 to ec7ce13 Compare July 30, 2025 02:13
@plotor plotor changed the title feat: Add get_runner_type method to support getting the currently used Runner type feat: Add get_or_infer_runner_type to support getting runner type from context Jul 30, 2025
@plotor plotor force-pushed the zhenchao-context-runner-20250721 branch from ec7ce13 to 79721be Compare July 30, 2025 02:33
@colin-ho colin-ho merged commit f4b0f15 into Eventual-Inc:main Jul 30, 2025
45 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants