Add numpy to the mypy pre-commit environment#20282
Merged
rapids-bot[bot] merged 28 commits intorapidsai:branch-25.12from Oct 16, 2025
Merged
Add numpy to the mypy pre-commit environment#20282rapids-bot[bot] merged 28 commits intorapidsai:branch-25.12from
rapids-bot[bot] merged 28 commits intorapidsai:branch-25.12from
Conversation
Added numpy to the mypy additional_dependencies to enable numpy type stubs for improved type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "np_dtype" at line 22. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "np_dtype" at line 33. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "np_dtype" at line 44. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "min_date" at line 124. Added explicit type annotation np.datetime64 for local variable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "np_dtypes_to_pandas_dtypes" at line 23. Added explicit type annotation dict[np.dtype[Any], pd.core.dtypes.base.ExtensionDtype] and imported Any from typing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "dtype" at line 205. Added explicit type annotation np.dtype[Any] for local variable in loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "dtype" at line 219. Added explicit type annotation np.dtype[Any] for local variable in loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "SUPPORTED_NUMPY_TO_PYLIBCUDF_TYPES" at line 763. Added explicit type annotation dict[np.dtype[Any], plc.types.TypeId]. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed mypy error: Need type annotation for "_UNDERLYING_DTYPE" at line 47. Added explicit type annotation np.dtype[np.int64] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed 4 mypy errors in dtypes.py by: 1. Converting is_pandas_nullable_extension_dtype to use TypeGuard[pd.core.dtypes.base.ExtensionDtype] - This resolved .na_value access errors on lines 255 and 262 2. Filtering dtypes to create cat_dtypes list with explicit isinstance checks - Ensures mypy knows all items are cudf.CategoricalDtype 3. Adding None check when filtering categorical dtypes - Filters out dtypes where _categories is None 4. Using explicit loop with assertions instead of list comprehensions - Helps mypy understand _categories is not None after filtering - This resolved ._categories access errors on lines 297 and 300 The TypeGuard pattern tells mypy to narrow the dtype type when the function returns True, making attribute access type-safe without runtime overhead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added type: ignore[call-overload] comment with detailed explanation. The issue is that numpy's type stubs for datetime64/timedelta64 constructors only accept literal strings for the time unit parameter (like "ns", "us", etc.) to enable compile-time validation. However, we're passing a variable string (to_unit) which contains a time unit that we know is valid at runtime. This is one of 20 errors on this line from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added type: ignore[call-overload] comment with detailed explanation. The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing self.time_unit which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added type: ignore[call-overload] comment (reusing explanation from max_dist). The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing self.time_unit which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added type: ignore[call-overload] comment (reusing explanation from max_dist). The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing to_res which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added assertion to narrow type for mypy before the constructor call. The issue is that col_dtype has a union type (DtypeObj) which includes many types, but at this point in the code we know it's one of the decimal types because of the check on lines 2224-2228. The assertion tells mypy that col_dtype is specifically a decimal dtype, so type(col_dtype) will be a decimal dtype constructor that accepts (precision, scale) arguments. This fixes 59 errors on this single line from numpy's strict type stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changed comment blocks from starting with "# type: ignore[call-overload]:" to "# call-overload must be ignored because" to avoid mypy treating them as malformed type: ignore directives. Fixed 2 errors at lines 165 and 317. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Group 2: Changed _NP_SCALAR from instance annotation to ClassVar to allow subclasses to assign specific numpy scalar types (datetime64 or timedelta64). Group 3: Added type: ignore[call-overload] comments for numpy constructor calls with variable time unit strings, which numpy stubs don't support. Fixed 7 total errors (3 from Group 2 + 4 from Group 3). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Added type: ignore[arg-type] comments for as_interval_column and as_decimal_column calls where mypy cannot narrow the dtype type from the is_dtype_obj_* function checks (which are not TypeGuards). Fixed 2 errors at lines 1734 and 1742. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed 8 mypy errors related to _get_nan_for_dtype:
1. Changed return type from DtypeObj to np.generic in dtypes.py
- Function returns numpy scalar values like np.float64('nan') or np.datetime64('NaT')
2. Added type: ignore[return-value] comments in numerical_base.py at 7 locations
- kurtosis() lines 93, 98
- quantile() line 184
- median() line 228
- cov() line 247
- corr() lines 255, 261
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added explicit type annotations for 12 variables across multiple files where mypy could not infer types after adding numpy stubs: - dtypes.py line 694: Fixed fields dict type annotation (str not bytes) - decimal.py lines 387, 512: Added data_buf_128 ndarray annotations - index.py lines 2882, 5175: Added dtype and child_type annotations - frame.py line 559: Updated to_array parameter to allow None - groupby.py line 1558: Added high ndarray annotation - csv.py line 38: Added _CSV_HEX_TYPE_MAP dict annotation - numeric.py line 180: Added downcast_dtype annotation - datetimes.py line 852: Added dtype annotation - queryutils.py line 26: Added SUPPORTED_QUERY_TYPES set annotation - fast_slow_proxy.py line 1166: Added transformed list annotation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Line 1167: Added type: ignore for list comprehension loop variable - Line 1174: Changed dtype=object to dtype=np.object_ for np.empty() - Line 1389: Fixed NUMPY_TYPES annotation from set[str] to set[type[np.generic]] These fixes address strict numpy stub type checking requirements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Updated type signatures across multiple files to accept both int and numpy unsigned integer types for seed parameters, resolving type inconsistencies between public APIs, column methods, and pylibcudf stubs. Changes: 1. Column-level methods (string.py, lists.py): Updated minhash, minhash64, hash_character_ngrams, minhash_ngrams, and minhash64_ngrams to accept int | np.uint32 or int | np.uint64 for seed parameters 2. Added runtime validation to convert int to appropriate numpy unsigned integer type with bounds checking before calling pylibcudf 3. Accessor-level methods (accessors/string.py): Updated method signatures to accept int | np.uint32 or int | np.uint64 for consistency 4. pylibcudf stubs: Updated minhash.pyi and generate_ngrams.pyi to accept int | np.unsignedinteger[Any] for seed parameters This allows public APIs to accept convenient int literals while maintaining type safety and proper conversion to numpy unsigned integers at runtime. Progress: 19 of 200 original mypy errors remain (90.5% complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixed remaining mypy type errors across core modules: Column modules: - struct.py: Added type narrowing for dtype assignment in to_arrow() - numerical.py: Added union type annotations for finfo bounds - numerical_base.py: Corrected placement of type: ignore for return value - decimal.py: Added None check for data_buf, fixed dtype assignment in to_arrow() Core modules: - series.py: Added type: ignore for is_dict_like() check - index.py: Added type: ignore for as_column return type in RangeIndex - frame.py: Widened dtype parameter type to Any in to_array helper - groupby.py: Inlined to_take construction to avoid assignment conflicts Tools modules: - numeric.py: Added type: ignore for numpy typecodes string access - datetimes.py: Added type: ignore for np.datetime64 with non-literal unit Pandas modules: - _wrappers/numpy.py: Added type: ignore for conditional flagsobj import - fast_slow_proxy.py: Corrected placement of type annotations for list comprehension and np.empty call Stub updates: - quantiles.pyi: Changed parameter type from Sequence[float] to Iterable[float] to accept numpy arrays All fixes preserve runtime behavior while satisfying mypy's strict type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
mroeschke
reviewed
Oct 16, 2025
mroeschke
reviewed
Oct 16, 2025
rapids-bot Bot
pushed a commit
that referenced
this pull request
Oct 16, 2025
Adding more packages to the mypy environment for validation is pushing us over the maximum size that the service allows for its environments. The mypy checks will still run as a part of the `check-style` job which runs on our CI system, we just won't have pre-commit.ci for this check. xref: - https://results.pre-commit.ci/run/github/90506918/1760575643.VSn5D0uuT16kJEDbQWeS5g > build of https://github.com/pre-commit/mirrors-mypy:types-cachetools,pyarrow-stubs,numpy@v1.13.0 for python@python3 exceeds tier max size 250MiB: 254MiB - #20282 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #20286
mroeschke
reviewed
Oct 16, 2025
mroeschke
approved these changes
Oct 16, 2025
KyleFromNVIDIA
approved these changes
Oct 16, 2025
bdice
approved these changes
Oct 16, 2025
Contributor
Author
|
/merge |
1 similar comment
Contributor
Author
|
/merge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Contributes to #11661
Checklist