Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for numpy.typing.NDArray #195

Closed
JulesGM opened this issue Jul 8, 2021 · 19 comments
Closed

Support for numpy.typing.NDArray #195

JulesGM opened this issue Jul 8, 2021 · 19 comments
Milestone

Comments

@JulesGM
Copy link

JulesGM commented Jul 8, 2021

NumPy has introduced a new typing library: https://numpy.org/devdocs/reference/typing.html.
One part of it is that you can mention the dtype of elements in the type:

import numpy.typing as npt
...
def some_function(
    a_float_array: npt.NDArray[np.float64],
    an_int_array: npt.NDArray[np.int64],
):
    ...

Right now, typeguard doesn't check that the dtype is respected, and doesn't emit notifications when passed a ndarray that has the wrong dtype.

I understand the eventual argument against making specific fixes. However, NumPy is one of if not the single most used third-party Python library, with incredibly wide applications, including in the domain of all types of research, like physics and medicine (and AI). Allowing better type safety in all of these great settings would maybe be more important than not making a specific change. Idk, you decide.

Best.

@agronholm
Copy link
Owner

I'm favorable to getting support for this in, but I don't have much time to work on typeguard these days.

@JulesGM
Copy link
Author

JulesGM commented Jul 9, 2021

maybe @BvB93 would be interested? 👉 👈

@BvB93
Copy link

BvB93 commented Jul 9, 2021

I'm not familiar with the typeguard library myself, but I can provide some further information about NDArray.
Namely, NDArray is an instance of either the PEP 585 type.GenericAlias class or a python <3.9-based
backport thereof, the latter carefully mimicking the semantics of the former.

So if you'd want to validate the dtype at runtime based on NDArray you could do something like this:

import numpy as np
import numpy.typing as npt
from typing import Type

array_annotation = npt.NDArray[np.float64]
dtype: Type[np.dtype] = array_annotation.__args__[-1]
scalar_type: Type[np.float64] = dtype.__args__[0]

# Compare `scalar_type` to the passed arrays `dtype.type` attribute
array = np.random.rand(10)
assert issubclass(array.dtype.type, scalar_type)

@JulesGM
Copy link
Author

JulesGM commented Jul 9, 2021

Thanks a lot @BvB93. Very quickly, let's say you write a function that receives types. How would one check if a type is a npt.NDarray? Should one do something like isinstance(expected_type, numpy.typing._generic_alias._GenericAlias) ? The underscore suggests this is private and shouldn't be done. Also not sure if this could be used for non NDArray types.

def check_type(argname, value, expected_type, memo):
   ...
    # @BvB93 Unsure The next line ?
    if npt and isinstance(expected_type, npt._generic_alias._GenericAlias) :
        expected_dtype = expected_type.__args__[-1]
        expected_scalar_type = expected_dtype.__args__[0] 
        if not isinstance(value, np.ndarray):
            raise TypeError(
                'type of {} must be {}; got {} instead'.
                format(argname, qualified_name(expected_type), qualified_name(value)))

        if expected_scalar_type != Any not issubclass(value.dtype.type, expected_scalar_type):
            raise TypeError('NumPy dtype of {} ({}) is not compatible with the {} dtype.'.format(
                argname, value.dtype, expected_scalar_type
            ))
    ...

@agronholm From my understanding, something like this would be ok as an addition to check_type, depending on what @BvB93 says about testing the expected_type.

In the imports:

try:
    import numpy.typing as not
    import numpy as np
except ImportError:
    npt = None
    np = None

Testing for whether numpy.typing can be imported and not checking the dtype otherwise makes typeguard compatible with older versions of numpy without the typing module, and to environments where numpy isn't there obviously.

@JulesGM
Copy link
Author

JulesGM commented Jul 9, 2021

(edited the previous comment with an import strategy)

@JulesGM
Copy link
Author

JulesGM commented Jul 9, 2021

expected_type = resolve_forwardref(expected_type, memo) seems to be stripping the numpy.typing information away from expected_type, at line https://github.com/agronholm/typeguard/blob/master/src/typeguard/__init__.py#L655

adding the code right before the expected_type = resolve_forwardref(expected_type, memo) call, at https://github.com/agronholm/typeguard/blob/master/src/typeguard/__init__.py#L654, seems to work, but I don't know if it makes sense.

@BvB93
Copy link

BvB93 commented Jul 10, 2021

Directly accessing private API is a very bad idea, especially since this will break for python >= 3.9:

- if npt and isinstance(expected_type, npt._generic_alias._GenericAlias) :
+ if npt:
+     GenericAlias = type(npt.NDArray)  # You could cache this somewhere as it's constant
+     if isintance(expected_type, GenericAlias) and expected_type.__origin__ is np.ndarray:

Furthermore, there is also the possibility of expected_scalar_type being a TypeVar and/or Union (and that is assuming we're limiting ourselves only to valid subscription types).

@JulesGM
Copy link
Author

JulesGM commented Jul 13, 2021

I'm well aware that it's a bad idea, that's why I was asking how

@JulesGM
Copy link
Author

JulesGM commented Jul 13, 2021

thanks, this is great.

@leycec
Copy link

leycec commented Sep 14, 2021

Obligitory necrobump: beartype fully supports numpy.typing.NDArray type hints, because our primary use case is data science and machine learning. After all, AI is Python's killer app. Beartype isn't typeguard, of course – but typeguard no longer appears to be actively maintained. That actually deeply saddens me. Most runtime type checkers are dead. It's a lonely field we stalwart few still labour in. cue sad cat 😿

The only active PEP-compliant runtime type checkers are beartype, pydantic, and typical – the latter two of which have yet to support numpy.typing.NDArray and maybe never will.

Lastly, I'll note for the sake of posterity that beartype fully supports numpy.typing.NDArray without violating privacy encapsulation as previously recommended above. Violating privacy encapsulation is actually why most runtime type checkers are dead... so, yeah. Not a great idea on the whole.

Come back to us with typeguard 3.0 next year, @agronholm! We already miss you in the community. And thanks for all the hard volunteerism, idealistic inspiration, and pragmatic usefulness over the many years.

@agronholm
Copy link
Owner

The trouble is that typeguard is no longer at the top of my (long) list of projects to be maintained. After I'm done updating my currently focus (APScheduler), I will move on to wheel, and if nobody has picked up the slack before that, I will turn my attention to typeguard again. How much time that will take is anybody's guess, but you should know that typeguard has not been completely abandoned.

@agronholm
Copy link
Owner

I think this warrants a plugin system. I'm tentatively scheduling this for v3.0.

@agronholm agronholm added this to the 3.0.0 milestone Oct 10, 2021
@patrick-kidger
Copy link

On the topic of plugin systems: torchtyping offers a system of rich type annotations for PyTorch tensors, e.g. torchtyping.TensorType["batch", "channels", float]. One of its flagship features is that these annotations can be checked at runtime using typeguard. However, doing so involves some (pretty complicated) patching of typeguard's internal machinery (torchtyping.patch_typeguard()).

All of which is to say: a plugin system sounds very interesting.

@agronholm
Copy link
Owner

I've opened #215 to discuss this.

@agronholm
Copy link
Owner

agronholm commented Dec 5, 2021

In case you are not already following the linked discussion, there is now a plugin system in the 3.0 branch. Let me know if it works for you.

@agronholm
Copy link
Owner

I have a PoC checker for ndarray. Let me know if this is useful. Is it necessary to check all elements or just the first?

from __future__ import annotations

from typing import Any

import numpy as np
from numpy.typing import NDArray
from typeguard import (
    TypeCheckError,
    TypeCheckMemo,
    TypeCheckerCallable,
    check_type_internal,
    config,
    typechecked
)


def check_ndarray(value: Any, origin_type: Any, args: tuple[Any, ...], memo: TypeCheckMemo):
    if not isinstance(value, np.ndarray):
        raise TypeCheckError("is not a numpy.ndarray")

    if args:
        expected_type = args[1].__args__[0]
        for i, v in enumerate(value):
            try:
                check_type_internal(v, expected_type, memo)
            except TypeCheckError as exc:
                exc.append_path_element(f"item {i}")
                raise


def numpy_checker_lookup(origin_type: Any, args: tuple, extras: tuple) -> TypeCheckerCallable | None:
    if origin_type is np.ndarray:
        return check_ndarray

    return None


config.checker_lookup_functions.append(numpy_checker_lookup)


@typechecked
def foo(a_float_array: NDArray[np.float64]):
    pass


array = np.zeros(5)
foo(array)

@patrick-kidger
Copy link

Btw -- heads up that jaxtyping now exists, which (thanks to a new and smarter design):

  • supports all of JAX/NumPy/PyTorch/TensorFlow (ignore the "jax" in the "jaxtyping" name :D )
  • provides type annotations for both shape and dtype
  • essentially compatible with static checking (which can check the array-ness, but not the shape or dtype)
  • is compatible with all runtime checkers (in particular both beartype and typeguard)
  • best of all it doesn't need any special support from the runtime type checker! (Begone ugly TorchTyping monkey-patching!)

I can see that typeguard might want to support the numpy.typing.NDArray class anyway. But jaxtyping is now my recommendation for all array-typing needs, regardless of framework.

I know typeguard is something of a back-burner project, so I'm mentioning all of this mostly to emphasise that I don't think any special demands need to be placed on typeguard in order to support array typing. :)

@agronholm
Copy link
Owner

Come back to us with typeguard 3.0 next year, @agronholm! We already miss you in the community. And thanks for all the hard volunteerism, idealistic inspiration, and pragmatic usefulness over the many years.

It's on the way. Due to the recent changes, @typechecked requires access to the original source code of the decorated function, so the beartype benchmark script no longer works. Would you mind modifying it to use a throwaway module so I can see the updated figures? Thanks!

@agronholm
Copy link
Owner

Typeguard 3.0.0 is out. While it's not part of Typeguard itself, there is a mechanism with which to implement checks like this with extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants