Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 26, 2022

This refactors unary and set expressions to use base classes. In addition, this updates binding to produce AlwaysTrue() or AlwaysFalse() values based on the bound type. For example, IsNull(a) when bound to a required field a will produce AlwaysFalse() because the value is required.

This also adds missing tests.

@rdblue rdblue requested a review from samredai July 26, 2022 22:38
)


# TODO: this should still exist as a parent, but term should be abstract
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UnaryPredicate and BoundUnaryPredicate classes introduced in this PR should implement the UnboundPredicate and BoundPredicate interfaces, but I didn't want to refactor the entire set of expressions in this PR so I added this TODO as a reminder.


@dataclass(frozen=True)
class AlwaysTrue(BooleanExpression, ABC, Singleton):
class AlwaysTrue(BooleanExpression, Singleton):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't actually abstract.

@rdblue rdblue changed the title Python: Refactor unary expressions Python: Refactor unary and set expressions Jul 27, 2022
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rdblue looking good! It is a bit more verbose than the previous approach, but the mypy errors shows that it is able to infer types 👍🏻


class BoundIsNaN(BoundUnaryPredicate[T]):
def __new__(cls, term: BoundTerm[T]):
bound_type = term.ref().field.field_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is done, by checking this in the __new__ method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure about this. There are mypy errors if you declare the return type from __new__ to be correct. This is from In.__new__ for example:

451: error: Incompatible return type for "__new__" (returns "BooleanExpression", but must return a subtype of "BoundIn[Any]")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the __new__ expects to return to the class it belongs to. I think we can ignore this one because I like the trick of returning a different type very much.

def bind(self, schema: Schema, case_sensitive: bool) -> BoundNotIn[T]:
bound_ref = self.term.bind(schema, case_sensitive)
return BoundNotIn(bound_ref, *tuple(lit.to(bound_ref.field.field_type) for lit in self.literals)) # type: ignore
def __new__(cls, term: UnboundTerm[T], literals: tuple[Literal[T], ...]) -> BooleanExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __new__(cls, term: UnboundTerm[T], literals: tuple[Literal[T], ...]) -> BooleanExpression:
def __new__(cls, term: UnboundTerm[T], literals: *Literal[T]) -> BooleanExpression:

See the comment at class In(SetPredicate[T]):

def __invert__(self) -> BoundNotIn[T]:
return BoundNotIn(self.term, *self.literals)
class BoundIn(BoundSetPredicate[T]):
def __new__(cls, term: BoundTerm[T], literals: set[Literal[T]]) -> BooleanExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider wrapping it into positional arguments:

class BoundIn(BoundSetPredicate[T]):
    def __new__(cls, term: BoundTerm[T], *literals: Literal[T]) -> BooleanExpression:
        literals_set = set(literals)
        count = len(literals)
        if count == 0:
            return AlwaysFalse()
        if count == 1:
            return BoundEq(term, literals.pop())
        else:
            return super().__new__(cls)

    def __invert__(self) -> BooleanExpression:
        return BoundNotIn(self.term, self.literals)

This way we also don't have to care about mutating the set. See comment below 👍🏻

This way we don't have to pass in a set:

BoundIn[str](
    base.BoundReference(
        field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
        accessor=Accessor(position=0, inner=None),
    ),
    StringLiteral("foo"), StringLiteral("bar"), StringLiteral("baz"),
)

Instead of:

BoundIn[str](
    base.BoundReference(
        field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
        accessor=Accessor(position=0, inner=None),
    ),
    {StringLiteral("foo"), StringLiteral("bar"), StringLiteral("baz")},
)

Copy link
Contributor

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments but otherwise this LGTM!


@dataclass(frozen=True)
class BoundNotNull(BoundUnaryPredicate[T]):
def __new__(cls, term: BoundTerm[T]):
Copy link
Contributor

@samredai samredai Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint for the 3.8 CI seems to not like that the argument signature is different between this and the default super().__new__. You can change this to match the *args, **kwargs from super, but then you'd have to pull out the kwargs and it won't look as clean plus won't allow positional arguments.

@dataclass(frozen=True)
class BoundNotNull(BoundUnaryPredicate[T]):
    def __new__(cls, *args, **kwargs):
        term: BoundTerm[T] = kwargs["term"]
        if term.ref().field.required:
            return AlwaysTrue()
        return super().__new__(cls)

This is just a linting error though and the assumptions made when using the super type hold well so I think it'd be better to just ignore it for these lines.

@dataclass(frozen=True)
class BoundNotNull(BoundUnaryPredicate[T]):
    def __new__(cls, term: BoundTerm[T]):  # pylint: disable=W0221

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I suppressed the false positives.

return IsNull(self.term)
@dataclass(frozen=True)
class UnaryPredicate(UnboundPredicate[T]):
def bind(self, schema: Schema, case_sensitive: bool = True) -> BooleanExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return type be BoundUnaryPredicate? Same question for SetPredicate.bind(...) and LiteralPredicate.bind(...) which can have return types of BoundSetPredicate and BoundLiteralPredicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binding may produce a different type of expression. For example, if IsNull is bound to a required type, binding will return AlwaysFalse



def test_bind_dedup(request):
schema = request.getfixturevalue("table_schema_simple")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not required but instead of using request here you can just use the fixture name and pytest will automatically grab it for you.

def test_bind_dedup(table_schema_simple):
    bound = base.BoundIn(
        base.BoundReference(table_schema_simple.find_field(1), table_schema_simple.accessor_for_field(1)), {literal("hello"), literal("world")}
    )
    assert base.In(base.Reference("foo"), (literal("hello"), literal("world"), literal("world"))).bind(table_schema_simple) == bound

@danielcweeks danielcweeks merged commit 0b84579 into apache:master Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants