Introduce a base class for aws triggers #32274

vandonr-amz · 2023-06-29T23:51:40Z

There has been a lot of triggers added recently, and they all reuse more or less the same code, with some variations.
This leads to many (small) inconsistencies in trigger code, and also in a lot of duplicated code.

We can package all this in a single base class and reuse it as much as possible, which is what I'm doing here.

Adapting all triggers to this base class is also a good way to resolve some of the inconsistencies.

REVIEWERS:

I'd advise starting by looking at the newly added file base_trigger.py
Then, look at how the other triggers were changed to use that new base trigger
Then, look at the small changes in the operators/sensors
Then, look at the tests.

cc @syedahsn @ferruzzi @vincbeck

vandonr-amz · 2023-07-05T17:48:04Z

airflow/providers/amazon/aws/hooks/athena.py

            wait(
                waiter=self.get_waiter("query_complete"),
-                waiter_delay=sleep_time or self.sleep_time,
+                waiter_delay=self.sleep_time if sleep_time is None else sleep_time,


this is a somewhat unrelated fix that allows specifying a sleep time of 0 in unit tests. Without this, athena unit tests were taking 30s each

This is fine, but you can also just mock the sleep function no?

I think just setting sleep_time=0 is sooo much simpler & cleaner & easier to read

vandonr-amz · 2023-07-05T17:53:31Z

airflow/providers/amazon/aws/triggers/athena.py

-        poll_interval: int,
-        max_attempt: int,
+        waiter_delay: int,
+        waiter_max_attempts: int,


this trigger was added in #32186 merged on June 27th, last provider release was on June 20th, so this breaking change is OK.

vandonr-amz · 2023-07-05T17:54:37Z

airflow/providers/amazon/aws/triggers/batch.py

-        compute_env_arn: str | None = None,
-        poll_interval: int = 30,
-        max_retries: int = 10,
+        compute_env_arn: str,
+        waiter_delay: int = 30,
+        waiter_max_attempts: int = 10,


this trigger was added in #32036 merged on June 27th, last provider release was on June 20th, so this breaking change is OK.

vandonr-amz · 2023-07-05T17:55:26Z

airflow/providers/amazon/aws/triggers/ecs.py

-class ClusterWaiterTrigger(BaseTrigger):
+class ClusterActiveTrigger(AwsBaseWaiterTrigger):


this trigger was added in #31881 merged on June 23rd, last provider release was on June 20th, so this breaking change is OK.

vandonr-amz · 2023-07-05T17:56:48Z

airflow/providers/amazon/aws/triggers/eks.py

-class EksNodegroupTrigger(BaseTrigger):
+class EksCreateNodegroupTrigger(AwsBaseWaiterTrigger):


this trigger was added in #32165 merged on June 26th, last provider release was on June 20th, so this breaking change is OK.

vandonr-amz · 2023-07-05T18:06:00Z

tests/providers/amazon/aws/triggers/test_athena.py

-        waiter_mock.side_effect = WaiterError("name", "reason", {})
-
-        trigger = AthenaTrigger("query_id", 0, 5, None)
+    def test_serialize_recreate(self):


This is probably best viewed in side-by-side diff. I removed existing tests because there is no logic anymore in individual triggers.
Instead, I'm testing the only thing that can be broken, which is the serialization/deserialization.
To do that, I do a cycle of serialize-deserialize-reserialize and I compare the serialized data. It'd probably be better to compare the instances, but at least comparing the serialized output can be done with a simple ==

I copy-pasted the same test for all triggers inheriting from the base, because I think it's better to have it in their respective files ? It could also be a parametrized test with many cases in test_base_trigger to avoid the code duplication, open to hear opinion about it.

vincbeck

A simple nit but overall I love it!

airflow/providers/amazon/aws/triggers/base_trigger.py

airflow/providers/amazon/provider.yaml

airflow/providers/amazon/aws/triggers/base_trigger.py

syedahsn · 2023-07-05T19:28:17Z

airflow/providers/amazon/aws/triggers/eks.py



-class EksNodegroupTrigger(BaseTrigger):
+class EksCreateNodegroupTrigger(AwsBaseWaiterTrigger):


Why are you making a separate trigger for create/delete here? Is it no longer possible to use a generic Trigger if the responsibility is to just poll for a particular state (depending on waiter_name)?

I remember you said back in the day that you preferred separate triggers so that the status/failure messages can be more descriptive ;)

Also, given the lower footprint of triggers created this way, I think it's ok to have specific triggers for each thing.

Ah, I ended up adopting your method haha. Alright, so just to be clear, I am going to be creating a separate Trigger for each operator again.

Also, given the lower footprint of triggers created this way, I think it's ok to have specific triggers for each thing.

Definitely. Big +1 for reducing the repeated code.

airflow/providers/amazon/aws/hooks/athena.py

airflow/providers/amazon/aws/operators/ecs.py

airflow/providers/amazon/aws/sensors/batch.py

airflow/providers/amazon/aws/triggers/base_trigger.py

airflow/providers/amazon/aws/triggers/ecs.py

ferruzzi · 2023-07-05T20:02:06Z

airflow/providers/amazon/aws/triggers/ecs.py

+        return EcsHook(aws_conn_id=self.aws_conn_id, region_name=self.region_name)
+
+
+class ClusterInactiveTrigger(AwsBaseWaiterTrigger):


I may be over-complicating this, but it looooks like the only difference between ClusterActiveTrigger and ClusterInactiveTrigger is the waiter_name value and the two message. In which case, why not have them both inherit a ClusterStatusTrigger which accepts those three values and drop all the repetition?

Same below where you split other Triggers out into status-specific ones.

the status/failure messages change as well.
But you have the same comment as syed here #32274 (comment)

airflow/providers/amazon/aws/triggers/eks.py

airflow/providers/amazon/aws/triggers/emr.py

tests/providers/amazon/aws/triggers/test_base_trigger.py

tests/providers/amazon/aws/triggers/test_batch.py

tests/providers/amazon/aws/triggers/test_eks.py

Co-authored-by: D. Ferruzzi <[email protected]>

syedahsn · 2023-07-05T20:53:42Z

airflow/providers/amazon/aws/triggers/emr.py

-        **kwargs: Any,
+        poll_interval: int | None = None,  # deprecated
+        waiter_delay: int = 30,
+        waiter_max_attempts: int = 600,


is this supposed to be 60?

it's supposed to be ∞ lol
we don't really have a clear posture on this, some triggers have sensible max attempts values, some others wait forever...
What I don't like with setting a "low" value like 60 is that if the user sets the poll interval to 1 for instance, it ends in 1 minute, which might be annoying.
Overall, I think it should be a really high value, and users should set it themselves if they care.

I think a good compromise is to keep it the same as the default one used by boto. That way, we don't inadvertently mess things up for users who might be relying on a Task to fail after a certain amount of time, because they didn't change the default value

ok, but in this particular case, the existing behavior was to wait forever, so....

tests/providers/amazon/aws/triggers/test_base_trigger.py

syedahsn

Some minor comments, but overall, looks really good.

ferruzzi

Looks like all of my comments were addressed.

vandonr-amz · 2023-07-06T21:00:55Z

this can be merged as is only if #32389 does not proceed.
If it does, I'll have to change the parts that are "breaking" to surround them with deprecation warnings.

vincbeck · 2023-07-07T18:56:13Z

Amazon provider package is excluded from RC2, therefore we can merge it. See #32389

vandonr-amz requested review from eladkal and o-nikolas as code owners June 29, 2023 23:51

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 29, 2023

vandonr-amz marked this pull request as draft June 29, 2023 23:53

vandonr-amz added 2 commits July 4, 2023 13:07

introduce a base class for aws triggers

2c91023

adapt tests

8814192

vandonr-amz force-pushed the vandonr/deferrable branch from 23e1322 to 8814192 Compare July 4, 2023 23:29

vandonr-amz added 4 commits July 4, 2023 16:53

write serializer tests for all migrated triggers

c567e54

mark dumb implem as not a test class

dacb392

add base to provider yaml

e48174c

fix optional types stuff

5c3d58b

vandonr-amz changed the title ~~WIP: introduce a base class for aws triggers~~ Introduce a base class for aws triggers Jul 5, 2023

Merge remote-tracking branch 'origin/main' into vandonr/deferrable

3cf3f23

vandonr-amz marked this pull request as ready for review July 5, 2023 17:47

vandonr-amz commented Jul 5, 2023

View reviewed changes

vincbeck reviewed Jul 5, 2023

View reviewed changes

airflow/providers/amazon/aws/triggers/base_trigger.py Outdated Show resolved Hide resolved

remove unnecessary None type & update doc a bit

2337cc8

vincbeck approved these changes Jul 5, 2023

View reviewed changes

eladkal reviewed Jul 5, 2023

View reviewed changes

airflow/providers/amazon/provider.yaml Outdated Show resolved Hide resolved

syedahsn reviewed Jul 5, 2023

View reviewed changes

airflow/providers/amazon/aws/triggers/base_trigger.py Show resolved Hide resolved

syedahsn reviewed Jul 5, 2023

View reviewed changes