Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Disable soft delete policy when creating new default bucket. #31344

Merged
merged 4 commits into from
May 21, 2024

Conversation

shunping
Copy link
Contributor

addresses #31330 in Python SDK. The logic is similar to #31324.

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @riteshghorse for label python.
R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping
Copy link
Contributor Author

R: @Abacn

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. Left a few comments regarding the integration test added

@@ -141,6 +143,43 @@ def test_batch_copy_and_delete(self):
self.assertFalse(
result[1], 're-delete should not throw error: %s' % result[1])

@pytest.mark.it_postcommit
@mock.patch('apache_beam.io.gcp.gcsio.default_gcs_bucket_name')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't sounds quite right that a PostCommit needs a mock. And this mock isn't mock a fake service, it's used to override nomenclature of temp bucket. What happens if we do not hack it?

Also, this test does not run a pipeline, should we configure it only run on test-suites:direct:py3xx:postCommitIT. Persumably currently it is running on Dataflow PostCommit IT suites which is not quite right

Copy link
Contributor Author

@shunping shunping May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @Abacn. Below are my responses.

it doesn't sounds quite right that a PostCommit needs a mock. And this mock isn't mock a fake service, it's used to override nomenclature of temp bucket. What happens if we do not hack it?

For a given project, the function of default_gcs_bucket_name will return a fixed bucket name as the default. If we don't override this, we need to create a particular project (other than using apache-beam-testing or whatever project the users want to provide during running this test) to test this. Per the offline discussion with @damccorm, it seems a bit overkill to create a project and then remove it afterward for this test. I think using mocking is kind of a "hack" but the code is clean. I am open to any better suggestion though.

Also, this test does not run a pipeline, should we configure it only run on test-suites:direct:py3xx:postCommitIT. Persumably currently it is running on Dataflow PostCommit IT suites which is not quite right

If you look at the other tests under gcsio_integration_test.py, they are also testing the gcsio functionality with an actual gcs operation. However, they don't trigger any pipeline running either.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation. This sounds good to me.

if existing_bucket:
existing_bucket.delete()

bucket = gcsio.get_or_create_default_gcs_bucket(google_cloud_options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realize, in case upstream code changed and the mock no longer effective, the following will delete the default bucket. We should assert that the created bucket is the one that with injected name, thus guard from deleting the real default bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added the check. PTAL

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@shunping
Copy link
Contributor Author

Run Python_Transforms PreCommit 3.8

@shunping
Copy link
Contributor Author

Run Python_Transforms PreCommit 3.9

@shunping
Copy link
Contributor Author

Run Python_Transforms PreCommit 3.10

@shunping
Copy link
Contributor Author

Run Python_Transforms PreCommit 3.11

@Abacn Abacn merged commit c5b6475 into apache:master May 21, 2024
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants