Skip to content

Conversation

@bmorris3
Copy link
Contributor

@bmorris3 bmorris3 commented Jul 9, 2025

Description

This PR supports more precise control over data translators in the registry. In a STScI use case, we have added a translator for a specific subclass (lightkurve.LightCurve) of a base class that already is in the registry (astropy.table.Table via glue-astronomy). We noticed that a glue translator for the base class (Table) is getting returned as the matching handler via data_translator.get_handler_for, even though we have an exact class match for the subclass (LightCurve) in the registry.

This PR gives preference to exact class matches in the translator registry. It now checks for exact matches via data_or_class is expected_cls between data class and the expected class in the translator registry, before following up with the original isinstance call if no exact match is found.

@bmorris3 bmorris3 force-pushed the data-translator-mro branch from cc8bc97 to 1e367c9 Compare July 9, 2025 15:44
@bmorris3 bmorris3 force-pushed the data-translator-mro branch from 1e367c9 to 75f2676 Compare July 9, 2025 15:56
@bmorris3 bmorris3 changed the title fix: make the data translator registry prefer exact cls matches over isinstance matches bugfix: make the data translator registry prefer exact cls matches over isinstance matches Jul 9, 2025
@bmorris3 bmorris3 marked this pull request as ready for review July 9, 2025 15:58
def get_handler_for(self, data_or_class):
for translator in self:
if isinstance(data_or_class, translator.target_cls) or data_or_class is translator.target_cls:
if data_or_class is translator.target_cls or type(data_or_class) is translator.target_cls:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle should we do the loop twice, once over exact matches then once with isinstance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think as long as the isinstance match also triggers a break in the same loop, it would still match if that translator happens to come ahead of the one here.

Copy link
Collaborator

@dhomeier dhomeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to make sense of how the subclass matches are handled so far, especially why e.g. LightCurve, being also a QTable subclass, would not get at least that translator rather than the Table one, but it appears that also can behave differently for a subclass and instances of the former. So this PR seems to get a long way towards more reasonable behaviour, but as noted in the comments, should try to also treat classes and their instances the same as much as possible.

Possibly still open question: should the best match always take precedence even over very different priorities? Seems priority has not been used much at all, so perhaps not that relevant.

def get_handler_for(self, data_or_class):
for translator in self:
if isinstance(data_or_class, translator.target_cls) or data_or_class is translator.target_cls:
if data_or_class is translator.target_cls or type(data_or_class) is translator.target_cls:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think as long as the isinstance match also triggers a break in the same loop, it would still match if that translator happens to come ahead of the one here.

handler = translator.handler
preferred = translator.target_cls
break
if isinstance(data_or_class, translator.target_cls):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(data_or_class, translator.target_cls):
if isinstance(data_or_class, translator.target_cls) or (isinstance(data_or_class, type) and issubclass(data_or_class, translator.target_cls)):

A subclass (rather than an instance of one) of a supported target_cls currently is not caught at all, which I think can lead to quite confusing behaviour as well.
Perhaps the whole code should be cleaned up a bit by pulling the type check below to the top of the loop

        if isinstance(data_or_class, type):
            data_class = data_or_class
        else:
            data_class = type(data_or_class)

and work with data_class from there on.

@astrofrog
Copy link
Member

Coming back to this I don't fully understand why the suggested change would work any better since instead of checking isinstance we now check for exact match then for isinstance, but we do both these checks for the current type in the loop. So let's say the order of the iteration is to check Table then Lightcurve, and the user passed a lightcurve object, then the code would do:

  • Check Table exact match -> no
  • Check Table subclass -> yes
  • Break

And it would never get to checking lightcurve.

So I guess there are two questions:

  • Why not just use priorities to make Lightcurve be a higher priority than Table?
  • Is it that here what we really want to do is prioritize exact matches if the priorities match? (in which case the code will need to be adjusted to do this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants