Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish function/method signatures #16

Open
mdhaber opened this issue Nov 22, 2024 · 9 comments
Open

Polish function/method signatures #16

mdhaber opened this issue Nov 22, 2024 · 9 comments

Comments

@mdhaber
Copy link
Owner

mdhaber commented Nov 22, 2024

Some function/methods have correct signatures, but others don't follow the requirements of the array API standard. Let's clean this up.

@mdhaber
Copy link
Owner Author

mdhaber commented Dec 12, 2024

I did some rough work toward this in 0ad4e74 and b62798d , but there are a few issues:

  • It is not careful to change all the signatures that need to be changed (and only the signatures that need to be changed).
  • For some functions, it shows documentation of arguments that we don't actually support.
  • It is superficial; it only changes what is shown in the docs and does not actually reject positional-only arguments passed by keyword or keyword-only arguments passed by position.
>>> help(mxp.sum)
Help on function sum in module marray:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    The following is the documentation for the corresponding attribute of `numpy`.
    The behavior of `marray` is the same when the mask is all `False`, and differs
    (as described in the tutorial) when the mask has `True` elements.
        Sum of array elements over a given axis.
        Parameters
        ----------
        a : array_like
            Elements to sum.

@mdhaber
Copy link
Owner Author

mdhaber commented Jan 28, 2025

@jorenham, I saw your post at numpy/numpy#27588 (comment), and thought you might be interested here. I'd appreciate your help with the signatures and typing, if you'd like to take a look! Currently, I've made a sloppy attempt to pass the documentation and signatures of the underlying library through. If you can see ways of doing this better, LMK.

@jorenham
Copy link

At first glance, it looks like typing this seems pretty doable, even without https://github.com/data-apis/array-api-typing.

What you could do, for instance, is describe MArray using a typing.Protocol. That would effectively get you a static verification suite for free.
For bonus points, you could make it also make it generic on the dtype, shape, and device.

@jorenham
Copy link

Anyway, feel free to ping me if there's something specific I need to review or otherwise look at.

@mdhaber
Copy link
Owner Author

mdhaber commented Jan 29, 2025

Thanks. Can I get a little more advice before I have real code? I'm still a bit lost.

Currently, for function documentation, I pass along the documentation of the underyling array library function with some modifications. I thought I'd want to do something similar with the type hints. At the most basic level, I thought that might look like getting the type hint information with inspect.get_annotations or typing.get_type_hints and using those to set the __annotations__ attribute of the marray function.

But that wouldn't be complete, because those type hints would suggest that functions only accept arrays of the underlying library, whereas I'd also want the function to accept an MArray for those arguments. Is that where describing MArray using typing.Protocol would come in - I could create that description, then inject that into the type hints extracted from the underlying library?

(Does this make any sense?)

@jorenham
Copy link

At the most basic level, I thought that might look like getting the type hint information with inspect.get_annotations or typing.get_type_hints and using those to set the __annotations__ attribute of the marray function.

Static type-checkers won't understand that, as they don't "run" your code, similar to ruff.

But that wouldn't be complete, because those type hints would suggest that functions only accept arrays of the underlying library, whereas I'd also want the function to accept an MArray for those arguments.

Yup; there's also that

Is that where describing MArray using typing.Protocol would come in - I could create that description, then inject that into the type hints extracted from the underlying library?

You got that right. You could see a protocol as the static-typing analogue of duck-typing at runtime.

(Does this make any sense?)

Aye!


To make the ducktyping/protocol comparison a bit more concrete, consider this function

def quack_the_duck(🦆):
    return 🦆.quack()

We don't even have a Duck type here or something, so duck could be an instance of anything, as long as it can quack (or pretend to quack).

We can use a Protocol to describe exactly this:

from typing import Protocol

class QuacksLikeADuck[QuackT](Protocol):
    def quack(self) -> QuackT: ...

And now we can use it to annotate the function:

def quack_the_duck[QuackT](🦆: QuacksLikeDuck[T]) -> QuackT:
    return 🦆.quack()

This was, static duckcheckers will know what's allowed as a 🦆, and what isn't. So for instance, this is

class Duck:
    def quack(self) -> str:
        return "Quack erat demonstranduck."

class Capybara:
    def chill(self) -> None: ...


quack_the_duck(Duck())  # inferred as `str`
quack_the_duck(Capybara())  # type checker error

@mdhaber
Copy link
Owner Author

mdhaber commented Jan 29, 2025

Thanks for that description of Protocol. I especially like the 🦆 : )

Static type-checkers won't understand that, as they don't "run" your code, similar to ruff.

Ok. I suppose I should have noticed that.

You wrote

At first glance, it looks like typing this seems pretty doable, even without https://github.com/data-apis/array-api-typing.

At a very high-level description, how did you have in mind that would be done? E.g. "manually from scratch" or "copy, paste, + search and replace" (from somewhere)?

If I were to write a Protocol for MArray, then where would the typing information for the library functions that accept MArray come from?

And even that seems like it shouldn't be necessary to do from scratch, since ISTM that the Protocol would be essentially the same as for any array API compatible array. Is that wrong - is it necessary for all Array API libraries to write type information from scratch?

Or is that where https://github.com/data-apis/array-api-typing will come in to help somehow? I can't tell what it does - the repo looks like an empty shell to me.

That would effectively get you a static verification suite for free

It feels like there is something so obvious that you didn't mention, but I am completely missing it!

@jorenham
Copy link

At a very high-level description, how did you have in mind that would be done? E.g. "manually from scratch" or "copy, paste, + search and replace" (from somewhere)?

That'd be one of those "manually from scratch" kinda deals.

If I were to write a Protocol for MArray, then where would the typing information for the library functions that accept MArray come from?

Are there different functions per library, or are you assuming a common set of functions with same signature, like in the array-api? And are we talking about methods or the other kind here?

And even that seems like it shouldn't be necessary to do from scratch, since ISTM that the Protocol would be essentially the same as for any array API compatible array. Is that wrong - is it necessary for all Array API libraries to write type information from scratch?

That protocol should be the common denominator of the masked arrays of all libraries. That way, if you have some masked array instance, m, it is valid to assign it to m2: SparseArray = m, where SparseArray is the protocol.
But that will only work if type-checkers actually understand what the type of m is. So you're right in thinking that those libraries will have to annotate their stuff, at least the bits that are relevant to the protocol.
But unless there's some hacky runtime magic going on, type-checkers can usually make a pretty good guess as to what a function is supposed to return if unannotated. And if not, they tend to default to Any, which is basically like the joker in a deck of cards, so it will then always be accepted by you protocol (so it might be a 🦆 after all).

Or is that where data-apis/array-api-typing will come in to help somehow? I can't tell what it does - the repo looks like an empty shell to me.

That's deferred at the moment, at least for me, as I don't have the time to work on that. But after scipy-stubs and numtype (a full rework of numpy's stubs), that'll be my next target. I hope that I'll be able to get it up and running this year (but there's always the unknown unknowns).

It feels like there is something so obvious that you didn't mention, but I am completely missing it!

If that protocol is part of your public API, then all that a lib has to do to verify that their masked array matches your spec, is to try and assign an instance of it to you protocol. So that if _: SparseArray = m passes, then m is a compatible masked array.

@lucascolley
Copy link
Collaborator

lucascolley commented Jan 30, 2025

Are there different functions per library, or are you assuming a common set of functions with same signature, like in the array-api? And are we talking about methods or the other kind here?

The idea is that this API implements exactly and only the array API standard. There are a few minor tweaks which don't break compatibility, like an added mask= parameter to asarray. So it seems like this is basically just waiting on array-api-typing.

The difference is that instead of working with arrays, it works with MArray[Array, Mask]s, where a Mask is just a boolean array of the same shape as the corresponding Array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants