S3 Key sensor deferrable #31749

syedahsn · 2023-06-07T01:14:22Z

The PR allows the S3KeySensor to be run in deferrable mode. I refactored the existing S3KeySensor to pull out the API calls, and leave a common process_files method that is used in both the deferrable and non-deferrable case. This reduces a lot of the code duplication.

Some duplication is unavoidable - like writing the head_object_async and get_file_metadata_async, but where possible, I tried code duplication to a minimum.

The unit tests for the S3KeySensorTrigger follow the same pattern of testing as the S3KeySensor.

@sunank200 @ephraimbuddy @pankajkoti

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

pankajkoti · 2023-06-07T05:16:48Z

@syedahsn perhaps you meant to tag @pankajastro :)

sunank200 · 2023-06-07T07:03:11Z

There is already a PR I have created here for the same: #31018

ephraimbuddy · 2023-06-07T07:19:54Z

@syedahsn, you probably missed my comment in the other PR, See #31018 (comment). Since duplication is the issue, can you do this when the other PR is merged? The other PR also added some dags which we can run in system tests when deferrable on system tests is supported

uranusjr · 2023-06-07T07:40:37Z

airflow/providers/amazon/aws/sensors/s3.py

+        for i in range(len(self.bucket_keys)):
+            bucket_key_names.append(
+                S3Hook.get_s3_bucket_key(self.bucket_name, self.bucket_keys[i], "bucket_name", "bucket_key")
+            )
+            bucket_name = bucket_key_names[i][0]
+            key = bucket_key_names[i][1]


This can use a rewrite with enumerate

uranusjr · 2023-06-07T07:42:31Z

airflow/providers/amazon/aws/sensors/s3.py

+            if self.check_fn is not None:
+                for files in event["files_list"]:
+                    results.append(self.check_fn(files))
+                return all(results)


Why build the entire results list and then call all()? Instead you can just do

for f in event["files_list"]: if not self.check_fn(f): return False return True

or

return all(self.check_fn(f) for f in event["files_list"])

uranusjr · 2023-06-07T07:43:25Z

airflow/providers/amazon/aws/sensors/s3.py

+                results.append(False)
+                continue
+            # Reduce the set of metadata to size only
+            files_list.append(list(map(lambda f: {"Size": f["Size"]}, key_matches)))


This list-building code can be improved by using iterators

uranusjr · 2023-06-07T07:44:48Z

airflow/providers/amazon/aws/triggers/s3.py

+        wildcard_keys = []
+        obj = []
+        bucket_key_names = []
+        for i in range(len(self.bucket_keys)):


This loop looks very much like poke in S3KeySensor. Can they share one single implementation?

syedahsn · 2023-06-08T00:53:38Z

@syedahsn, you probably missed my comment in the other PR, See #31018 (comment). Since duplication is the issue, can you do this when the other PR is merged? The other PR also added some dags which we can run in system tests when deferrable on system tests is supported

I saw your comment, but I decided to open this PR to get your thoughts (as well as the community's) on the approach I took for this sensor. As I mentioned in the initial PR for this sensor, one concern is the amount of code that is being duplicated for the async case. I think that keeping code duplication to a minimum is very important because it will lead to a code base that is easier to maintain, and less error-prone.

I'm willing to wait for the merge of #31018 before addressing the code duplication, but because of the differences in our approaches, it would mean that I would end up removing a lot of the code introduced in #31018.

I'll leave it to you to decide whether we should merge the inital PR and I address the code duplication in a follow-up PR or whether we collaborate now to come up with a suitable solution.

S3 Key sensor deferrable

62583dd

syedahsn requested review from eladkal and o-nikolas as code owners June 7, 2023 01:14

boring-cyborg bot added area:providers area:system-tests provider:amazon AWS/Amazon - related issues labels Jun 7, 2023

syedahsn added 4 commits June 6, 2023 18:16

Remove changes to s3 system test

cdd1380

Remove an incorrect test

2afacc3

use unused variable

edafc66

Fix doc string

def8570

uranusjr reviewed Jun 7, 2023

View reviewed changes

syedahsn closed this Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 Key sensor deferrable #31749

S3 Key sensor deferrable #31749

Uh oh!

syedahsn commented Jun 7, 2023 •

edited

Loading

Uh oh!

pankajkoti commented Jun 7, 2023

Uh oh!

sunank200 commented Jun 7, 2023

Uh oh!

ephraimbuddy commented Jun 7, 2023

Uh oh!

uranusjr Jun 7, 2023

Uh oh!

uranusjr Jun 7, 2023

Uh oh!

uranusjr Jun 7, 2023

Uh oh!

uranusjr Jun 7, 2023

Uh oh!

syedahsn commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

S3 Key sensor deferrable #31749

S3 Key sensor deferrable #31749

Uh oh!

Conversation

syedahsn commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pankajkoti commented Jun 7, 2023

Uh oh!

sunank200 commented Jun 7, 2023

Uh oh!

ephraimbuddy commented Jun 7, 2023

Uh oh!

uranusjr Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

uranusjr Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

uranusjr Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

uranusjr Jun 7, 2023

Choose a reason for hiding this comment

Uh oh!

syedahsn commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

syedahsn commented Jun 7, 2023 •

edited

Loading