Add numpy to the mypy pre-commit environment by vyasr · Pull Request #20282 · rapidsai/cudf

vyasr · 2025-10-16T00:47:20Z

Description

Contributes to #11661

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Added numpy to the mypy additional_dependencies to enable numpy type stubs for improved type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "np_dtype" at line 22. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "np_dtype" at line 33. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "np_dtype" at line 44. Added explicit type annotation np.dtype[np.object_] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "min_date" at line 124. Added explicit type annotation np.datetime64 for local variable. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "np_dtypes_to_pandas_dtypes" at line 23. Added explicit type annotation dict[np.dtype[Any], pd.core.dtypes.base.ExtensionDtype] and imported Any from typing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "dtype" at line 205. Added explicit type annotation np.dtype[Any] for local variable in loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "dtype" at line 219. Added explicit type annotation np.dtype[Any] for local variable in loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "SUPPORTED_NUMPY_TO_PYLIBCUDF_TYPES" at line 763. Added explicit type annotation dict[np.dtype[Any], plc.types.TypeId]. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed mypy error: Need type annotation for "_UNDERLYING_DTYPE" at line 47. Added explicit type annotation np.dtype[np.int64] to class attribute. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed 4 mypy errors in dtypes.py by: 1. Converting is_pandas_nullable_extension_dtype to use TypeGuard[pd.core.dtypes.base.ExtensionDtype] - This resolved .na_value access errors on lines 255 and 262 2. Filtering dtypes to create cat_dtypes list with explicit isinstance checks - Ensures mypy knows all items are cudf.CategoricalDtype 3. Adding None check when filtering categorical dtypes - Filters out dtypes where _categories is None 4. Using explicit loop with assertions instead of list comprehensions - Helps mypy understand _categories is not None after filtering - This resolved ._categories access errors on lines 297 and 300 The TypeGuard pattern tells mypy to narrow the dtype type when the function returns True, making attribute access type-safe without runtime overhead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added type: ignore[call-overload] comment with detailed explanation. The issue is that numpy's type stubs for datetime64/timedelta64 constructors only accept literal strings for the time unit parameter (like "ns", "us", etc.) to enable compile-time validation. However, we're passing a variable string (to_unit) which contains a time unit that we know is valid at runtime. This is one of 20 errors on this line from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added type: ignore[call-overload] comment with detailed explanation. The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing self.time_unit which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added type: ignore[call-overload] comment (reusing explanation from max_dist). The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing self.time_unit which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added type: ignore[call-overload] comment (reusing explanation from max_dist). The issue is that numpy's type stubs for timedelta64 constructors only accept literal strings for the time unit parameter to enable compile-time validation. However, we're passing to_res which is a variable containing a valid time unit at runtime. This is one of 30 errors on lines 313-323 from numpy's overly restrictive stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added assertion to narrow type for mypy before the constructor call. The issue is that col_dtype has a union type (DtypeObj) which includes many types, but at this point in the code we know it's one of the decimal types because of the check on lines 2224-2228. The assertion tells mypy that col_dtype is specifically a decimal dtype, so type(col_dtype) will be a decimal dtype constructor that accepts (precision, scale) arguments. This fixes 59 errors on this single line from numpy's strict type stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Changed comment blocks from starting with "# type: ignore[call-overload]:" to "# call-overload must be ignored because" to avoid mypy treating them as malformed type: ignore directives. Fixed 2 errors at lines 165 and 317. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Group 2: Changed _NP_SCALAR from instance annotation to ClassVar to allow subclasses to assign specific numpy scalar types (datetime64 or timedelta64). Group 3: Added type: ignore[call-overload] comments for numpy constructor calls with variable time unit strings, which numpy stubs don't support. Fixed 7 total errors (3 from Group 2 + 4 from Group 3). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added type: ignore[arg-type] comments for as_interval_column and as_decimal_column calls where mypy cannot narrow the dtype type from the is_dtype_obj_* function checks (which are not TypeGuards). Fixed 2 errors at lines 1734 and 1742. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed 8 mypy errors related to _get_nan_for_dtype: 1. Changed return type from DtypeObj to np.generic in dtypes.py - Function returns numpy scalar values like np.float64('nan') or np.datetime64('NaT') 2. Added type: ignore[return-value] comments in numerical_base.py at 7 locations - kurtosis() lines 93, 98 - quantile() line 184 - median() line 228 - cov() line 247 - corr() lines 255, 261 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added explicit type annotations for 12 variables across multiple files where mypy could not infer types after adding numpy stubs: - dtypes.py line 694: Fixed fields dict type annotation (str not bytes) - decimal.py lines 387, 512: Added data_buf_128 ndarray annotations - index.py lines 2882, 5175: Added dtype and child_type annotations - frame.py line 559: Updated to_array parameter to allow None - groupby.py line 1558: Added high ndarray annotation - csv.py line 38: Added _CSV_HEX_TYPE_MAP dict annotation - numeric.py line 180: Added downcast_dtype annotation - datetimes.py line 852: Added dtype annotation - queryutils.py line 26: Added SUPPORTED_QUERY_TYPES set annotation - fast_slow_proxy.py line 1166: Added transformed list annotation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Line 1167: Added type: ignore for list comprehension loop variable - Line 1174: Changed dtype=object to dtype=np.object_ for np.empty() - Line 1389: Fixed NUMPY_TYPES annotation from set[str] to set[type[np.generic]] These fixes address strict numpy stub type checking requirements. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Updated type signatures across multiple files to accept both int and numpy unsigned integer types for seed parameters, resolving type inconsistencies between public APIs, column methods, and pylibcudf stubs. Changes: 1. Column-level methods (string.py, lists.py): Updated minhash, minhash64, hash_character_ngrams, minhash_ngrams, and minhash64_ngrams to accept int | np.uint32 or int | np.uint64 for seed parameters 2. Added runtime validation to convert int to appropriate numpy unsigned integer type with bounds checking before calling pylibcudf 3. Accessor-level methods (accessors/string.py): Updated method signatures to accept int | np.uint32 or int | np.uint64 for consistency 4. pylibcudf stubs: Updated minhash.pyi and generate_ngrams.pyi to accept int | np.unsignedinteger[Any] for seed parameters This allows public APIs to accept convenient int literals while maintaining type safety and proper conversion to numpy unsigned integers at runtime. Progress: 19 of 200 original mypy errors remain (90.5% complete) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed remaining mypy type errors across core modules: Column modules: - struct.py: Added type narrowing for dtype assignment in to_arrow() - numerical.py: Added union type annotations for finfo bounds - numerical_base.py: Corrected placement of type: ignore for return value - decimal.py: Added None check for data_buf, fixed dtype assignment in to_arrow() Core modules: - series.py: Added type: ignore for is_dict_like() check - index.py: Added type: ignore for as_column return type in RangeIndex - frame.py: Widened dtype parameter type to Any in to_array helper - groupby.py: Inlined to_take construction to avoid assignment conflicts Tools modules: - numeric.py: Added type: ignore for numpy typecodes string access - datetimes.py: Added type: ignore for np.datetime64 with non-literal unit Pandas modules: - _wrappers/numpy.py: Added type: ignore for conditional flagsobj import - fast_slow_proxy.py: Corrected placement of type annotations for list comprehension and np.empty call Stub updates: - quantiles.pyi: Changed parameter type from Sequence[float] to Iterable[float] to accept numpy arrays All fixes preserve runtime behavior while satisfying mypy's strict type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Adding more packages to the mypy environment for validation is pushing us over the maximum size that the service allows for its environments. The mypy checks will still run as a part of the `check-style` job which runs on our CI system, we just won't have pre-commit.ci for this check. xref: - https://results.pre-commit.ci/run/github/90506918/1760575643.VSn5D0uuT16kJEDbQWeS5g > build of https://github.com/pre-commit/mirrors-mypy:types-cachetools,pyarrow-stubs,numpy@v1.13.0 for python@python3 exceeds tier max size 250MiB: 254MiB - #20282 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #20286

…numpy

vyasr · 2025-10-16T21:18:44Z

/merge

vyasr · 2025-10-16T22:15:50Z

/merge

vyasr and others added 25 commits October 16, 2025 00:45

Add numpy to mypy additional_dependencies in pre-commit config

e2a5bc9

Added numpy to the mypy additional_dependencies to enable numpy type stubs for improved type checking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fix bug

972081d

vyasr self-assigned this Oct 16, 2025

vyasr requested review from a team as code owners October 16, 2025 00:47

vyasr requested a review from AyodeAwe October 16, 2025 00:47

vyasr added the improvement Improvement / enhancement to an existing function label Oct 16, 2025

vyasr added the non-breaking Non-breaking change label Oct 16, 2025

vyasr requested review from Matt711 and mroeschke October 16, 2025 00:47

github-actions Bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas pylibcudf Issues specific to the pylibcudf package labels Oct 16, 2025

github-project-automation Bot added this to cuDF Python Oct 16, 2025

GPUtester moved this to In Progress in cuDF Python Oct 16, 2025

bdice mentioned this pull request Oct 16, 2025

Skip mypy in pre-commit.ci #20286

Merged

3 tasks

mroeschke reviewed Oct 16, 2025

View reviewed changes

Comment thread python/cudf/cudf/core/column/column.py Outdated

mroeschke reviewed Oct 16, 2025

View reviewed changes

Comment thread python/cudf/cudf/core/frame.py Outdated

vyasr added 2 commits October 16, 2025 18:13

PR review

c3bbbe9

Merge remote-tracking branch 'upstream/branch-25.12' into fix/typing_…

568bf8f

…numpy

mroeschke reviewed Oct 16, 2025

View reviewed changes

Comment thread python/cudf/cudf/core/column/column.py

Remove one more unused ignore

437da97

mroeschke approved these changes Oct 16, 2025

View reviewed changes

KyleFromNVIDIA approved these changes Oct 16, 2025

View reviewed changes

bdice approved these changes Oct 16, 2025

View reviewed changes

rapids-bot Bot merged commit e534472 into rapidsai:branch-25.12 Oct 16, 2025
137 checks passed

github-project-automation Bot moved this from In Progress to Done in cuDF Python Oct 16, 2025

vyasr deleted the fix/typing_numpy branch October 16, 2025 22:15

vyasr mentioned this pull request Nov 7, 2025

[ENH] More type-stubs in the mypy pre-commit environment? #11661

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add numpy to the mypy pre-commit environment#20282

Add numpy to the mypy pre-commit environment#20282
rapids-bot[bot] merged 28 commits intorapidsai:branch-25.12from
vyasr:fix/typing_numpy

vyasr commented Oct 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr commented Oct 16, 2025

Uh oh!

vyasr commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vyasr commented Oct 16, 2025

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr commented Oct 16, 2025

Uh oh!

vyasr commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants