Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle importing relocated dispatch functions #623

Merged
merged 5 commits into from
May 26, 2021

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented May 24, 2021

Requires PR ( rapidsai/cudf#8342 )

As these functions recently got relocated, handle importing from the new location with a fallback to the old location.

xref: dask/dask#7503
xref: dask/dask#7505

@jakirkham jakirkham requested a review from a team as a code owner May 24, 2021 23:48
@github-actions github-actions bot added the python python code needed label May 24, 2021
@jakirkham jakirkham added bug Something isn't working non-breaking Non-breaking change labels May 24, 2021
This allows us to smooth over differences between Dask before the
dispatch refactor and after.
@quasiben
Copy link
Member

Thanks @jakirkham for the work and @galipremsagar for the review!

@quasiben
Copy link
Member

@gpucibot merge

@jakirkham
Copy link
Member Author

Seeing this on CI

19:52:07 ________________________ test_dataframes_share_dev_mem _________________________
19:52:07 
19:52:07     def test_dataframes_share_dev_mem():
19:52:07         cudf = pytest.importorskip("cudf")
19:52:07     
19:52:07         df = cudf.DataFrame({"a": range(10)})
19:52:07 >       grouped = shuffle_group(df, "a", 0, 2, 2, False, 2)
19:52:07 
19:52:07 dask_cuda/tests/test_proxify_host_file.py:109: 
19:52:07 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
19:52:07 dask_cuda/proxify_device_objects.py:142: in wrapper
19:52:07     ret = func(*args, **kwargs)
19:52:07 /opt/conda/envs/rapids/lib/python3.7/site-packages/dask/dataframe/shuffle.py:816: in shuffle_group
19:52:07     ind = hash_object_dispatch(df[cols] if cols else df, index=False)
19:52:07 /opt/conda/envs/rapids/lib/python3.7/site-packages/dask/utils.py:511: in __call__
19:52:07     meth = self.dispatch(type(arg))
19:52:07 /opt/conda/envs/rapids/lib/python3.7/site-packages/dask/utils.py:499: in dispatch
19:52:07     return self.dispatch(cls)  # recurse
19:52:07 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
19:52:07 
19:52:07 self = <dask.utils.Dispatch object at 0x7fe9f6b57fd0>
19:52:07 cls = <class 'cudf.core.dataframe.DataFrame'>
19:52:07 
19:52:07     def dispatch(self, cls):
19:52:07         """Return the function implementation for the given ``cls``"""
19:52:07         # Fast path with direct lookup on cls
19:52:07         lk = self._lookup
19:52:07         try:
19:52:07             impl = lk[cls]
19:52:07         except KeyError:
19:52:07             pass
19:52:07         else:
19:52:07             return impl
19:52:07         # Is a lazy registration function present?
19:52:07         toplevel, _, _ = cls.__module__.partition(".")
19:52:07         try:
19:52:07             register = self._lazy.pop(toplevel)
19:52:07         except KeyError:
19:52:07             pass
19:52:07         else:
19:52:07             register()
19:52:07             return self.dispatch(cls)  # recurse
19:52:07         # Walk the MRO and cache the lookup result
19:52:07         for cls2 in inspect.getmro(cls)[1:]:
19:52:07             if cls2 in lk:
19:52:07                 lk[cls] = lk[cls2]
19:52:07                 return lk[cls2]
19:52:07 >       raise TypeError("No dispatch for {0}".format(cls))
19:52:07 E       TypeError: No dispatch for <class 'cudf.core.dataframe.DataFrame'>
19:52:07 
19:52:07 /opt/conda/envs/rapids/lib/python3.7/site-packages/dask/utils.py:505: TypeError

@jakirkham
Copy link
Member Author

cc @madsbk

@galipremsagar
Copy link
Contributor

This error is because hash_object_dispatch location has changed and dask-cudf code is not yet updated with correct location. dask-cudf will silently fail as the import is in try/catch and the dispatch is never registered for cudf.DataFrame. Fix for this is already included in : https://github.com/rapidsai/cudf/pull/8342/files#diff-8cccf94d2261588b89ab7cd02573abfe37a19aab98fb6d210184c5ce293e9b3eR249

@galipremsagar
Copy link
Contributor

rerun tests

@jakirkham
Copy link
Member Author

rerun tests

(looks like packages are up now)

@jakirkham
Copy link
Member Author

Yeah saw that. Though it's not related to this change. Looks like some UCX configuration test failure. My inclination is to mark it as xfail so we can get this out and unblock people on CI who still need these changes. Thoughts?

Comment on lines +53 to 56
@pytest.mark.xfail(reason="https://github.com/rapidsai/dask-cuda/issues/627")
def test_global_option():
for seg_size in ["2K", "1M", "2M"]:
p = mp.Process(target=_test_global_option, args=(seg_size,))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok marked this test with xfail so we can get these changes out-the-door to fix other CI issues. Though raised issue ( #627 ) to track/investigate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok. @pentschev do you have any concerns? we are still in burn down so if something does crop up we have time to handle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity cuDF's and RAFT's CI still fails without these changes, which is blocking a fair number of people. No objection to revisiting, but I do think we need to do this short term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, it never failed before, including in my nightly runs with both UCX 1.9 and UCX master. I'm actually concerned that this may be some error from a previous test that pytest is spilling over. I opened a PR to trigger CI in #628 , let's hold on and see if that fails too before merging this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it is weird. I don't recall seeing this error on this PR before that build either.

Ok though JFYI before this came up, Ben had used a merge command upthread ( #623 (comment) ). So the bot may wind up merging this anyways if passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah, it was too quick, so no worries. I'm just now wondering if this isn't a bit dangerous, one could do any changes to the PR after someone told the bot to merge, and unless somebody sees it in time, it would be merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it probably should be associated with a commit hash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just invalidated based on timestamp. If commit timestamp > gpucibot merge comment, invalidate automerge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the time the commit was pushed, not the local commit timestamp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted PR ( #630 ) to revert this piece. Probably still fails, but at least that gives us something to work from

@codecov-commenter
Copy link

codecov-commenter commented May 26, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@ab1d35c). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06     #623   +/-   ##
===============================================
  Coverage                ?   90.56%           
===============================================
  Files                   ?       15           
  Lines                   ?     1621           
  Branches                ?        0           
===============================================
  Hits                    ?     1468           
  Misses                  ?      153           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab1d35c...6e32eca. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 1ef38b4 into rapidsai:branch-21.06 May 26, 2021
@jakirkham jakirkham deleted the fix_dask_dispatch branch May 26, 2021 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants