Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed IDs. #1149

Open
1 task
elenakrittik opened this issue Jan 9, 2024 · 16 comments
Open
1 task

Typed IDs. #1149

elenakrittik opened this issue Jan 9, 2024 · 16 comments
Labels
feature request Request for a new feature

Comments

@elenakrittik
Copy link
Contributor

elenakrittik commented Jan 9, 2024

Summary

Change all ID types from int to *Id newtypes for greater type safety.

What is the feature request for?

The core library

The Problem

Currently, this (~pseudo) code passes type checks, but in reality this will almost certainly raise an exception:

id = inter.channel.id
inter.guild.fetch_member(id)

While the above code is obviously wrong, command bodies can get large enough in practice that the error will not be as easy to spot. Using IDs in incorrect places usually results in an unhandled exception, but in very extreme cases can lead to damage (f.e. as thread ids are the same as the original message ids, one can delete thread instead of the original message or vice versa). The more developer passes IDs around, the more is the chance of making mistakes like this.

The Ideal Solution

Change all raw int IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.

The implementation is very straightforward:

ApplicationId = NewType("ApplicationId", int)
ChannelId = NewType("ChannelId", int)
# and so on for every resource type

The rest is only about updating the library to use Ids instead of raw ints, and maybe updating converters.

The Current Solution

Carefully reviewing, testing or otherwise ensuring that IDs for e.g. channels are not used where e.g. a role Id was expected.

Additional Context

I am willing to work on this. Not anymore :(

This was originally inspired by twilight-model's Id.

Open Questions

  • Is there a way to increase backwards compatibility by allowing literal integers in places where a *Id is expected?
@elenakrittik elenakrittik added the feature request Request for a new feature label Jan 9, 2024
@shiftinv
Copy link
Member

This is definitely an interesting proposal, thanks!

The cost of using IDs in incorrect places can range from an unhandled exception up to damaging users' servers and, in very rare, but still real cases, introducing security holes (if the ID comes from user input).

I can't think of situations where a mistake like that could cause any damage, given that IDs are unique (barring a few exceptions), but it's rather annoying and difficult to debug nonetheless, yeah.

Change all raw int IDs to appropriate newtypes. This incurs zero (or close to zero) overhead, but allows to prevent incorrect usage of IDs.

Sounds reasonable.
I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])). It's less in >=3.11 (python/cpython@96c4cbd), but not fully negligible especially in earlier versions.

Compatibility is still to be estimated.

Compatibility is my main concern here, even if it's just in the type-checking realm, and not at runtime.
As far as I understand, only simple expressions like fetch_channel(channel.id) would continue to work as-is, anything more advanced would likely require replacing annotations with the appropriate *Id types everywhere; while reasonable in many cases, this can be a pretty big ask in larger projects.

The ideal compatible solution would be making methods accept something like ChannelId | int instead of just ChannelId. With the current typing.NewType semantics however, that would still allow passing mismatching ID types.

I don't have any data on how often mix-ups between different ID "types" happen - I imagine it's probably not that frequent to warrant a fundamental change like this, personally.
If this library was still in very early development, adopting typed IDs like this would be really cool. By now, preserving compatibility (when possible) is pretty essential, and there doesn't seem to be an obvious way to accomplish that here (wouldn't mind to be proven wrong about this, though).

@elenakrittik
Copy link
Contributor Author

I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])). It's less in >=3.11 (python/cpython@96c4cbd), but not fully negligible especially in earlier versions.

I think we can make an exception for those and # type: ignore them, given that these IDs are coming from Discord and are likely unlikely to be invalid/wrong.

@Enegg
Copy link
Contributor

Enegg commented Jan 12, 2024

Wouldn't the identity function overhead be dwarfed by the int cast anyway?

@OseSem
Copy link
Contributor

OseSem commented Feb 21, 2024

Change all ID types from int to *Id newtypes for greater type safety.

This is cool and all, but if they were to do this it would bring some really weird code. For example: everytime you want to search someone up you would have to ApplicationID(applicationid) which could really flood the code. And if the ID doesn't exist it would already flag.

@OseSem
Copy link
Contributor

OseSem commented Feb 21, 2024

And you can make an error_handler using disnake.ext.commands.UserNotFound for example.

@elenakrittik
Copy link
Contributor Author

elenakrittik commented Feb 21, 2024

This proposal's intention is to allow optional, type-check-time guarantees about validness of ID usage. Whether to use it or not is up to you, and if you prefer catching errors at runtime instead, you will be able to do that.

(if we find a solution for the compatibility problem..)

@OseSem
Copy link
Contributor

OseSem commented Feb 21, 2024

There could be something like bot.get_member() which could be member_id: int | UserID

@Enegg
Copy link
Contributor

Enegg commented Feb 24, 2024

There could be something like bot.get_member() which could be member_id: int | UserID

The union would defeat the purpose of this issue, since NewType(..., int) is a subtype of int.

@tooruu
Copy link
Contributor

tooruu commented Jul 17, 2024

I'm somewhat worried about the additional overhead in very common expressions/hot paths like self.id = MessageId(int(data["id"])).

int call is redundant here, since MessageId is its subclass. self.id = MessageId(data["id"]) is enough.

@Enegg
Copy link
Contributor

Enegg commented Jul 17, 2024

int call is redundant here, since MessageId is its subclass. self.id = MessageId(data["id"]) is enough.

It is not a subclass. At runtime it behaves like identity function.

@Snipy7374
Copy link
Collaborator

Snipy7374 commented Aug 13, 2024

An idea could be to use what PEP-702 propose, allowing for Union[*Id, int] and marking the overload that accepts int as deprecated e.g:

from typing import overload
from warnings import deprecated

@overload
async def fetch_member(self, id: UserId) -> disnake.Member:
	...

@overload
@deprecated("Only UserId will be supported in future versions")
async def fetch_member(self, id: int) -> disnake.Member:
	...

async def fetch_member(self, id: Union[UserId, int], /) -> disnake.Member:
	...

this will return a deprecation warning message by the type checker if the user invoke the method with an int id parameter. The deprecated decorator will be made available in the typing_extensions too for backward compatibility, still i know that this is a big change and as such needs some more bikeshedding, for this reason if we will ever decide to implement this, it should be done in the 3.0 version. The depreacation warning should remain for some versions to give the users the time to migrate.

@elenakrittik
Copy link
Contributor Author

elenakrittik commented Aug 17, 2024

@Snipy7374 This is an interesting idea, thanks! Sadly it still has the same problem about *Ids being automatically cast to int, so if we were to go this path we would need to phrase the deprecation message in a way that accounts for both regular int literal usage and autocasted newtypes of other ids. The amount of overloads required is also daunting, unless it can be codemodded (which would pollute the codebase even more, unless we figure out how to separate "source source" code from "after-codemod source" code, which i imagine will be especially hard around IDE support)

@Enegg
Copy link
Contributor

Enegg commented Aug 17, 2024

As far as the overloads go, I think we could make a decorator for this.
Regardless, from what I gather, we don't actually want to depreciate passing in a raw int?
And if so, it'd be a stretch to use depreciated here.

@Enegg
Copy link
Contributor

Enegg commented Aug 20, 2024

I've written some example implementation of how the forementioned decorator could look like.

from collections import abc
from typing import Concatenate, Never, NewType, ParamSpec, Protocol, overload
from typing_extensions import TypeVar, deprecated

ObjectID = NewType("ObjectID", int)
IdT = TypeVar("IdT", bound=ObjectID, infer_variance=True)
P = ParamSpec("P")
RetT = TypeVar("RetT", infer_variance=True)


class AcceptsID(Protocol[IdT, P, RetT]):
    @overload
    def __call__(self, id: IdT, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...

    @overload
    def __call__(self, id: ObjectID, /, *args: P.args, **kwargs: P.kwargs) -> Never: ...

    @overload
    @deprecated("Using raw ints is not value-safe", category=None)
    def __call__(self, id: int, /, *args: P.args, **kwargs: P.kwargs) -> RetT: ...


def overload_id(func: abc.Callable[Concatenate[IdT | int, P], RetT], /) -> AcceptsID[IdT, P, RetT]:
    return func  # pyright: ignore[reportReturnType]

I've added the intermediary ObjectID to catch invalid ID types in the overload resolution. However, this makes wrapping raw ints even more annoying.
We could circumvent that by replacing ObjectID with a Union of all ID types, though it might lead to some circular imports/references.

AppID = NewType("AppID", ObjectID)
ChannelID = NewType("ChannelID", ObjectID)


@overload_id
def get_channel(id: ChannelID | int, /) -> object: ...

Passing in an int shows the depreciation message. Passing in AppID resolves the return type to Never, which marks any following code unreachable.
To get an instance of ChannelID, one must do id = ChannelID(ObjectID(123)).

@Enegg
Copy link
Contributor

Enegg commented Nov 14, 2024

I took a bite at implementing this.
It opens up a road for creating interesting overloads:

# Client.py
# ObjectId is a Union of all IDs
@overload
def get_channel(self, id: ChannelId, /) -> Optional[GuildChannel]: ...
@overload
def get_channel(self, id: ThreadId, /) -> Optional[Thread]: ...
@overload
def get_channel(self, id: PrivateChannelId, /) -> Optional[PrivateChannel]: ...
@overload
def get_channel(self, id: ObjectId, /) -> None: ...  # the "wrong ID" case
@overload
def get_channel(self, id: int, /) -> Optional[Union[GuildChannel, Thread, PrivateChannel]]: ...
def get_channel(self, id: Union[ObjectId, int], /) -> Optional[Union[GuildChannel, Thread, PrivateChannel]]:

Though obviously someone has to write them 😐

@Enegg
Copy link
Contributor

Enegg commented Nov 15, 2024

See Enegg@e325f87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

6 participants