-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild for & Support NumPy 2 #16300
Conversation
{{ pin_compatible('numpy') }}
& build with NumPy 2 (constrain to NumPy 1 though){{ pin_compatible('numpy') }}
& build w/NumPy 2 (restrict to 1)
Need NumPy 2 compatible RMM wheels to proceed. Marking as draft |
{{ pin_compatible('numpy') }}
& build w/NumPy 2 (restrict to 1)@@ -64,8 +64,7 @@ requirements: | |||
- rapids-build-backend >=0.3.0,<0.4.0.dev0 | |||
- scikit-build-core >=0.10.0 | |||
- dlpack >=0.8,<1.0 | |||
# TODO: Change to `2.0` for NumPy 2 | |||
- numpy 1.23 | |||
- numpy 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good to get this done. But I suspect we can just remove NumPy as a host requirement for cudf? (at least as a direct requirement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately as Arrow is currently needed during the build (and Arrow uses NumPy headers in some cases), there is a NumPy build dependency. Please see this comment: #15165 (comment)
Once Arrow is dropped as a build dependency, we can remove NumPy from build dependencies too: #15193 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, sorry I forgot. For that it should be OK to build with 1.x, but I agree it is better to just change it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I should have mentioned the second issue, which is we need to relax the NumPy pinning in RMM
For example we see this error on CI:
ERROR: Cannot install numpy==2.0.*, rmm-cu11==24.10.0a21 and rmm-cu11==24.10.0a22 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested numpy==2.0.*
rmm-cu11 24.10.0a22 depends on numpy<2.0a0 and >=1.23
The user requested numpy==2.0.*
rmm-cu11 24.10.0a21 depends on numpy<2.0a0 and >=1.23
If we do that, we can build with NumPy 2 here (even if we are still using NumPy 1 at runtime)
Edit: Should add we can do this whenever we like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to Sebastian we are now building RMM to allow NumPy 2: rapidsai/rmm#1650
Will update this PR once there are RMM packages available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once Arrow is dropped as a build dependency, we can remove NumPy from build dependencies too: #15193 (comment)
Just noting that #16640 is up for review and all its CI is passing, I suspect it'll be merged soon.
This PR and that one will have some conflicts... which is good, because I think one should be tested again after the other is merged.
6b98bce
to
44b1a7b
Compare
This changes the builds to start using NumPy 2 when building cuDF and friends. However this maintains the same runtime dependency to restrict cuDF and friends to NumPy 1.23+ (pre-2). As building libraries against NumPy 2 provides NumPy 2 compatibility and maintains NumPy 1 compatibility (back to NumPy 1.19), this provides equivalent NumPy compatibility to what we had before and extends it further. This also helps us ensure that our builds are (and remain) compatible with NumPy 2. Finally this will help us prep recipes for the NumPy 2 migration with RDFG still to come.
It looks like cudf/conda/recipes/cudf/meta.yaml Lines 67 to 71 in 6a2f323
cudf/conda/recipes/pylibcudf/meta.yaml Lines 67 to 68 in 6a2f323
cudf/conda/recipes/pylibcudf/meta.yaml Lines 84 to 85 in 6a2f323
So in order to rebuild with NumPy 2, we also need to soften the NumPy constraints at runtime Given this we can either...
Personally would prefer to decouple this from the Arrow upgrade. So would lean towards 1 or 2 |
Went ahead and cherry-picked/squashed the changes from PR: #16594 That should fix the CI issues previously seen here |
Depends on your timeline. After #16590 is merged, my next PR will remove arrow as a build dependency, so it should be done by EOW 🤞 |
Am hoping we have this done before EOW. Sebastian is at EuroSciPy next week |
Removing |
There are some tests here that use Dask-CUDA. However Dask-CUDA also was pinned to NumPy 1. So it wasn't possible to test with NumPy 2 and Dask-CUDA in the same environment Now that we have relaxed Dask-CUDA constraints to allow NumPy 2 in PR ( rapidsai/dask-cuda#1375 ) and those packages are now available, will merge in upstream changes to this PR to retest with Dask-CUDA and NumPy 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here make sense to me. Glad we were able to relax this constraint.
/merge |
/merge |
Hmm...when resolving conflicts all jobs passed except one CI job That one CI job for some reason is downgrading Pandas from 2.2.2 (which is NumPy 2 compatible) to 2.0.0 (which is not): Downloading pandas-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 261.9 MB/s eta 0:00:00
Installing collected packages: pandas
Attempting uninstall: pandas
Found existing installation: pandas 2.2.2
Uninstalling pandas-2.2.2:
Successfully uninstalled pandas-2.2.2
Successfully installed pandas-2.0.0 The exception that follows later is an expected consequence of combining an incompatible Pandas & NumPy Not sure why that downgrading is happening. For now have tried restarting CI It is worth noting before conflicts that same job was passing and nothing in the logic here has changed after those conflicts If this issue persists, may need help from others to identify what may have changed outside this PR |
Hmm...that job has now failed with a more perplexing error, which is definitely unrelated. Something about having difficulty identifying the CUDA Driver and Runtime versions + python -m pytest -p cudf.pandas --cov-config=./python/cudf/.coveragerc --cov=cudf --cov-report=xml:/__w/cudf/cudf/coverage-results/cudf-pandas-coverage.xml --cov-report=term ./python/cudf/cudf_pandas_tests/
/pyenv/versions/3.10.14/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:64: UserWarning: Error getting driver and runtime versions:
stdout:
stderr:
Traceback (most recent call last):
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 254, in ensure_initialized
self.cuInit(0)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 327, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 395, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuInit results in CUDA_ERROR_UNKNOWN
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 292, in __getattr__
self.ensure_initialized()
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
raise CudaSupportError(f"Error at driver init: {description}")
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_UNKNOWN (999)
Not patching Numba
warnings.warn(msg, UserWarning)
/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py:331: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: helpconfig, Hook: pytest_cmdline_parse
CUDARuntimeError: cudaErrorUnknown: unknown error
For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
config = pluginmanager.hook.pytest_cmdline_parse(
Traceback (most recent call last):
File "/pyenv/versions/3.10.14/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/pyenv/versions/3.10.14/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pytest/__main__.py", line 5, in <module>
raise SystemExit(pytest.console_main())
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 192, in console_main
code = main()
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 150, in main
config = _prepareconfig(args, plugins)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 331, in _prepareconfig
config = pluginmanager.hook.pytest_cmdline_parse(
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pluggy/_hooks.py", line 513, in __call__
return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pluggy/_manager.py", line 120, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pluggy/_callers.py", line 156, in _multicall
teardown[0].send(outcome)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/helpconfig.py", line 104, in pytest_cmdline_parse
config: Config = outcome.get_result()
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pluggy/_result.py", line 100, in get_result
raise exc.with_traceback(exc.__traceback__)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/pluggy/_callers.py", line 103, in _multicall
res = hook_impl.function(*args)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1075, in pytest_cmdline_parse
self.parse(args)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1425, in parse
self._preparse(args, addopts=addopts)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1301, in _preparse
self.pluginmanager.consider_preparse(args, exclude_only=False)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 709, in consider_preparse
self.consider_pluginarg(parg)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 735, in consider_pluginarg
self.import_plugin(arg, consider_entry_points=True)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/_pytest/config/__init__.py", line 781, in import_plugin
__import__(importspec)
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/cudf/__init__.py", line 20, in <module>
validate_setup()
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/cudf/utils/gpu_utils.py", line 55, in validate_setup
raise e
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/cudf/utils/gpu_utils.py", line 52, in validate_setup
gpus_count = getDeviceCount()
File "/pyenv/versions/3.10.14/lib/python3.10/site-packages/rmm/_cuda/gpu.py", line 102, in getDeviceCount
raise CUDARuntimeError(status)
rmm._cuda.gpu.CUDARuntimeError: cudaErrorUnknown: unknown error Restarting again |
These Pandas test now try all minor versions of Pandas 2 before Pandas latest. Though only Pandas 2.2.2+ is NumPy 2. However when we downgrade Pandas, NumPy is not downgraded to NumPy 1. As a result, we get the expected errors when trying to mix incompatible NumPy & Pandas versions. This is in part due to Pandas not having an upper bound on its NumPy constraint. Though it is possible `pip` would ignore this anyways. So to fix this issue, just use NumPy 1 for these older versions of Pandas, which should work consistently for them. We have plenty of NumPy 2 coverage outside of this particular test. Especially consider we do run tests with the latest Pandas and NumPy first, which will use NumPy 2 and Pandas 2.2.2+. https://pandas.pydata.org/pandas-docs/version/2.2.2/whatsnew/v2.2.2.html#pandas-2-2-2-is-now-compatible-with-numpy-2-0 https://github.com/pandas-dev/pandas/blob/v2.0.3/pyproject.toml#L26-L28 https://github.com/pandas-dev/pandas/blob/v2.1.4/pyproject.toml#L32-L34
@@ -75,7 +75,7 @@ IFS=',' read -r -a versions <<< "$output" | |||
|
|||
for version in "${versions[@]}"; do | |||
echo "Installing pandas version: ${version}" | |||
python -m pip install "pandas==${version}" | |||
python -m pip install "numpy>=1.23,<2.0a0" "pandas==${version}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the rerun got the original failure with incompatible Pandas & NumPy versions as explained above ( #16300 (comment) ). That said, now see this is due to PR ( #16595 ), which added legacy Pandas testing
Wrote much more details about this in the commit message ( 3bf2311 ), the gist is only Pandas 2.2.2+ is NumPy 2 compatible. However when pip
downgrades Pandas here NumPy is not downgraded from NumPy 2 to NumPy 1. As these older Pandas versions don't have upper bounds (restricting to NumPy 1), pip
won't downgrade them (though Idk whether that would have been enough for pip
anyways)
To fix this issue, am just pinning to NumPy 1 in this install command. Have confirmed this works locally and expect it to work here as well
Since we already test with NumPy 2 above (not to mention in various other tests here), think those tests are sufficient and it is fine to stick to NumPy 1 for these older Pandas versions (even as the long tail includes more NumPy 2 support over time)
cudf/ci/cudf_pandas_scripts/run_tests.sh
Lines 64 to 69 in 3bf2311
python -m pytest -p cudf.pandas \ | |
--cov-config=./python/cudf/.coveragerc \ | |
--cov=cudf \ | |
--cov-report=xml:"${RAPIDS_COVERAGE_DIR}/cudf-pandas-coverage.xml" \ | |
--cov-report=term \ | |
./python/cudf/cudf_pandas_tests/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed this fixed that job
One job had an unrelated nvjitlink error: =================================== FAILURES ===================================
___________ test_arith_masked_vs_constant_reflected[data1-True-mod] ____________
[gw6] linux -- Python 3.10.14 /opt/conda/envs/test/bin/python3.10
self = <pynvjitlink.api.NvJitLinker object at 0xfffdf2679e70>
input_type = <InputType.PTX: 2>
data = b'//\n// Generated by NVIDIA NVVM Compiler\n//\n// Compiler Build ID: CL-31968024\n// Cuda compilation tools, release ...bal.u64 \t%rd62, %rd12;\n\tadd.s64 \t%rd63, %rd62, %rd61;\n\tst.global.u8 \t[%rd63], %rs4;\n\n$L__BB0_10:\n\tret;\n\n}'
name = '<cudapy-ptx>'
def add_data(self, input_type, data, name):
if self._complete:
raise NvJitLinkError("Cannot add data to already-completeted link")
try:
> _nvjitlinklib.add_data(self.handle, input_type.value, data, name)
E RuntimeError: NVJITLINK_ERROR_INTERNAL error when calling nvJitLinkAddData
/opt/conda/envs/test/lib/python3.10/site-packages/pynvjitlink/api.py:53: RuntimeError
During handling of the above exception, another exception occurred:
self = data
0 1
1 <NA>
2 1
func = <function test_arith_masked_vs_constant_reflected.<locals>.func at 0xfffdf277c0d0>
kernel_getter = <function _get_row_kernel at 0xfffe61b3cc10>, args = ()
kwargs = {}
@acquire_spill_lock()
@_performance_tracking
def _apply(self, func, kernel_getter, *args, **kwargs):
"""Apply `func` across the rows of the frame."""
if kwargs:
raise ValueError("UDFs using **kwargs are not yet supported.")
try:
> kernel, retty = _compile_or_get(
self, func, args, kernel_getter=kernel_getter
)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/indexed_frame.py:3472:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/utils/performance_tracking.py:51: in wrapper
return func(*args, **kwargs)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/udf/utils.py:275: in _compile_or_get
kernel, scalar_return_type = kernel_getter(frame, func, args)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/udf/row_function.py:162: in _get_row_kernel
kernel = _get_kernel(kernel_string, global_exec_context, sig, func)
return op(constant, x)
# Just a single column -> result will be all NA
gdf = cudf.DataFrame({"data": data})
# cudf differs from pandas for 1**NA
request.applymarker(
pytest.mark.xfail(
condition=(constant == 1 and op in {operator.pow, operator.ipow}),
reason="https://github.com/rapidsai/cudf/issues/7478",
)
)
> run_masked_udf_test(func, gdf, check_dtype=False)
tests/test_udf_masked_ops.py:267:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_udf_masked_ops.py:73: in run_masked_udf_test
obtain = gdf.apply(func, args=args, axis=1)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/utils/performance_tracking.py:51: in wrapper
return func(*args, **kwargs)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/dataframe.py:4724: in apply
return self._apply(func, _get_row_kernel, *args, **kwargs)
/opt/conda/envs/test/lib/python3.10/contextlib.py:79: in inner
return func(*args, **kwds)
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/utils/performance_tracking.py:51: in wrapper
return func(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = data
0 1
1 <NA>
2 1
func = <function test_arith_masked_vs_constant_reflected.<locals>.func at 0xfffdf277c0d0>
kernel_getter = <function _get_row_kernel at 0xfffe61b3cc10>, args = ()
kwargs = {}
@acquire_spill_lock()
@_performance_tracking
def _apply(self, func, kernel_getter, *args, **kwargs):
"""Apply `func` across the rows of the frame."""
if kwargs:
raise ValueError("UDFs using **kwargs are not yet supported.")
try:
kernel, retty = _compile_or_get(
self, func, args, kernel_getter=kernel_getter
)
except Exception as e:
> raise ValueError(
"user defined function compilation failed."
) from e
E ValueError: user defined function compilation failed.
/opt/conda/envs/test/lib/python3.10/site-packages/cudf/core/indexed_frame.py:3476: ValueError Merely documenting this here. Will restart that job after the rest complete Edit: Another job ran into the CUDA Driver error mentioned above ( #16300 (comment) ) |
Ah looks like it cleared and the old merge comment took affect In any event the small tweak to the test requirement above ( #16300 (comment) ) is consistent with what happened before this PR. Just that test condition now test latest Pandas with NumPy 2 If anything comes up or needs discussion, would be happy to chat 🙂 |
Updated guidance on dropping NumPy build dependency in comment ( #15193 (comment) ) |
Description
Part of issue: rapidsai/build-planning#38
Start building
cudf
withnumpy
version2.0
. This remains compatible withnumpy
version1.x
and2.x
. Allows us to test building withnumpy
version2.0
(and make sure we catch any issues that show up). Also relaxes thenumpy
1.x
pin. Pulls in the RDFG changes that are rolling out for broader RAPIDS NumPy 2 support.Checklist