Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework and enhance type hierarchy and generics #115

Merged
merged 1 commit into from
Feb 3, 2023

Conversation

thetorpedodog
Copy link
Contributor

@thetorpedodog thetorpedodog commented Feb 3, 2023

Context: single-cell-data/TileDB-SOMA#638 and single-cell-data/TileDB-SOMA#540

  • Pulls basic collection impl into BaseCollection, allowing Collection to add the semantics of "no semantics". This mirrors the implementation in tiledbsoma.
  • Makes Measurement and Experiment inherit from BaseCollection. Previous problems with this were due to missing __slots__.
  • Adds generic parameters to Measurement and Experiment that allow implementations to specify the exact types they provide, saving a lot of casting down the road.
  • Renames lots of TypeVars to be clearer about their purpose.
  • Adds overloads to add_new_collection for better type inference.
  • Tightens _Experimentish to only expect read accessors, not writers.

While these new changes add a bunch of generic slots to the base collection types and experiments and measurements, the experience from the perspective of a SOMA library user will be roughly the same. That is to say, it's a little scary here, but the end user will still see theimpl.Collection[ElementType]. Type inference when using composed objects is better as well:

some_exp = theimpl.Experiment.open(...)

obs = some_exp.obs
reveal_type(obs)
# BEFORE: somacore.DataFrame
#         (i.e., the type system doesn't know what implementation
#         of the abstract DataFrame this is; it only knows about
#         the bare minimum DataFrame properties)
# AFTER:  theimpl.DataFrame

ms = some_exp.ms
reveal_type(ms)
# BEFORE: somacore.Collection[somacore.Measurement]
# AFTER:  theimpl.Collection[theimpl.Measurement]

some_meas = ms["whatever"]
reveal_type(ms)
# BEFORE: somacore.Measurement
# AFTER:  theimpl.Measurement

some_meas.X
reveal_type(ms)
# BEFORE: somacore.Collection[somacore.NDArray]
# AFTER:  theimpl.Collection[theimpl.NDArray]

There is no change at runtime; the actual types of the objects remain the same, but autocompletion, type checking, and other tooling has a much better idea of what is going on.


To show what this looks like on the tiledbsoma side, the diff is pretty small, but the key part is in io.py, where the cast(tiledbsoma.Measurement, ms[whatever]) no longer needs to happen, since the type system already knows it’s a tiledbsoma.Measurement. While that is the only change there specifically, there will be corresponding improvements in user code.

And just to reiterate: runtime behavior is identical, and any code which works now will continue to work, but static type inference is significantly improved.

- Pulls basic collection impl into BaseCollection, allowing Collection
  to add the semantics of "no semantics". This mirrors the
  implementation in tiledbsoma.
- Makes Measurement and Experiment inherit from BaseCollection.
  Previous problems with this were due to missing `__slots__`.
- Adds generic parameters to Measurement and Experiment that allow
  implementations to specify the exact types they provide, saving a lot
  of `cast`ing down the road.
- Renames lots of TypeVars to be clearer about their purpose.
- Adds overloads to `add_new_collection` for better type inference.
- Tightens `_Experimentish` to only expect read accessors, not writers.

While these new changes add a bunch of generic slots to the base
collection types and experiments and measurements, the experience from
the perspective of a SOMA library user will be roughly the same. That is
to say, it's a little scary here, but the end user will still see
`theimpl.Collection[ElementType]`. Type inference when using composed
objects is better as well:

    some_exp = theimpl.Experiment.open(...)

    obs = some_exp.obs
    reveal_type(obs)
    # BEFORE: somacore.DataFrame
    #         (i.e., the type system doesn't know what implementation
    #         of the abstract DataFrame this is; it only knows about
    #         the bare minimum DataFrame properties)
    # AFTER:  theimpl.DataFrame

    ms = some_exp.ms
    reveal_type(ms)
    # BEFORE: somacore.Collection[somacore.Measurement]
    # AFTER:  theimpl.Collection[theimpl.Measurement]

    some_meas = ms["whatever"]
    reveal_type(ms)
    # BEFORE: somacore.Measurement
    # AFTER:  theimpl.Measurement

    some_meas.X
    reveal_type(ms)
    # BEFORE: somacore.Collection[somacore.NDArray]
    # AFTER:  theimpl.Collection[theimpl.NDArray]

There is no change at runtime; the actual types of the objects remain
the same, but autocompletion, type checking, and other tooling has a
*much* better idea of what is going on.
@johnkerl johnkerl changed the title Rework and enhance type hierarchy and generics. Rework and enhance type hierarchy and generics Feb 3, 2023
@thetorpedodog thetorpedodog merged commit 4aab19f into main Feb 3, 2023
@thetorpedodog thetorpedodog deleted the more-better-types branch February 3, 2023 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants