-
Notifications
You must be signed in to change notification settings - Fork 294
feat: Add get_or_infer_runner_type
to support getting runner type from context
#4810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add get_or_infer_runner_type
to support getting runner type from context
#4810
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4810 +/- ##
==========================================
+ Coverage 78.81% 79.21% +0.40%
==========================================
Files 893 893
Lines 124507 124159 -348
==========================================
+ Hits 98128 98357 +229
+ Misses 26379 25802 -577
🚀 New features to boost your workflow:
|
c8e282c
to
5a176e7
Compare
None
get_runner_type
5a176e7
to
aa012c3
Compare
get_runner_type
get_runner_type
method to support getting the currently used Runner type
aa012c3
to
713779f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So i'm a little worried that this API can cause confusion to users, given that the results of get_runner_type
can easily be changed based on order of operations and execution environment. I think it would be more appropriate for this to be called get_or_infer_runner
. Additionally, I'd also prefer if this was a standalone function instead of a method on DaftContext, to keep it simple.
In order to know which runner is executing the UDF though, it will be more foolproof to pass this information into the UDF itself. This would be of larger scope and needs some design.
Lastly, could you please add tests for this in tests/test_context.py
.
src/daft-context/src/python.rs
Outdated
@@ -45,6 +46,24 @@ impl PyDaftContext { | |||
} | |||
} | |||
} | |||
|
|||
pub fn get_runner_type(&self, py: Python) -> PyResult<PyObject> { | |||
let runner_type = self.inner.runner().map_or_else( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should call get_runner_config_from_env
to check the env as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why get_runner_config_from_env
needs to be called here, because the setting of DAFT_RUNNER
env may be inconsistent with set_runner_xxx
, and the latter has a higher priority, so the result obtained by get_runner_config_from_env
may not be accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_runner_type()
can also be inaccurate if it is called before and after set_runner_xxx
, however if there is no set_runner_xxx
, it can also be inaccurate due to DAFT_RUNNER
env.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, because native
is the default runner of daft, if get_runner_config_from_env
is not called, then get_runner_type
will always return native
before and after set_runner_ray
is not called. Now the so-called "inaccurate" will only occur when the following two conditions are met:
get_runner_type
is called beforeset_runner_xxx
.set_runner_xxx
is inconsistent withDAFT_RUNNER
env settings.
But now when set_runner_xxx
is inconsistent with DAFT_RUNNER
env settings, a warn log will be printed to remind the user, so get_runner_type
will not confuse the user.
Good suggestions, thanks.
ea9988b
to
3823999
Compare
Thanks for taking the time to review. The tests have been added to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing, otherwise it looks good
daft/context.py
Outdated
@@ -47,6 +47,21 @@ def __init__(self, ctx: PyDaftContext | None = None): | |||
else: | |||
self._ctx = PyDaftContext() | |||
|
|||
def get_runner_type(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to get_or_infer_runner_type
to make it clearer that this will not create the runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, and modified
3823999
to
ec7ce13
Compare
…rom context Signed-off-by: plotor <[email protected]>
get_runner_type
method to support getting the currently used Runner typeget_or_infer_runner_type
to support getting runner type from context
ec7ce13
to
79721be
Compare
Changes Made
We found that in some scenarios, users need to obtain the Runner type of Daft in UDF, but currently it can only be obtained through
daft.context.get_context()._runner.name
. The problem is that the UDF running on the ray worker getsNone
result when calldaft.context.get_context()._runner
, so I added adaft.context.get_context().get_runner_type()
method in this PR. The execution mechanism of this method is as follows:Prioritize
daft.context.get_context()._runner
to determine the Runner type;If
daft.context.get_context()._runner
isNone
, call thedetect_ray_state
method to determine whether it's currently running on ray. If so, the current Runner type is considered to beray
, otherwise it isnative
.In addition, I found that when the
DAFT_RUNNER
env is inconsistent withset_runner_xxx
, Daft will prioritize theset_runner_xxx
settings, so I added some warn logs to remind users.Related Issues
No issue
Checklist
docs/mkdocs.yml
navigation