Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Disable soft delete policy when creating a default bucket for a project. #31324

Merged
merged 5 commits into from
May 17, 2024

Conversation

shunping
Copy link
Contributor

@shunping shunping commented May 17, 2024

addresses #31330 in Java SDK

  • Soft delete policy is disabled during creation of default buckets.
  • getBucket() and removeBucket() are added into GcsUtil.
  • Added a unit test and an integration test for creating a default bucket.

Also, getBucket() and removeBucket() are added into GcsUtil.
@shunping shunping force-pushed the gcs-bucket-rentention-policy branch from 8452bfd to 8f2d2c6 Compare May 17, 2024 03:11
@shunping
Copy link
Contributor Author

R: @liferoad @damccorm

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had one comment, otherwise LGTM


String tempLocation =
GcpOptions.GcpTempLocationFactory.tryCreateDefaultBucketWithPrefix(
options, crmClient, "gcp-options-it-");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for existence of the bucket and delete it if it exists before we do this? That way we avoid a bad condition where the delete fails on one test and then we're stuck in a bad state indefinitely

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove it? I haven't checked the internal logic. Could two Dataflow jobs use the same default bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default bucket name is a concatenation of a fixed prefix, region and a project number. So for a given project, if it is running in the same region, the default bucket name would be the same.

For our integration test, it would always be the same as "gcp-options-it-us-central1-844138762903". Actually, this may cause race condition if we run this integration test twice at the same time though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this test, I think I have to add some random number in the prefix then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for existence of the bucket and delete it if it exists before we do this? That way we avoid a bad condition where the delete fails on one test and then we're stuck in a bad state indefinitely

Well, if we add a random number to avoid race condition, then we could end up with multiple dangling buckets if deletion fails. That's a tradeoff we need to make, to avoid race condition or to avoid dangling buckets. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is fine. If it causes problems we could always add a cleanup job like https://github.com/apache/beam/blob/master/.test-infra/tools/stale_bq_datasets_cleaner.sh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for existence of the bucket and delete it if it exists before we do this? That way we avoid a bad condition where the delete fails on one test and then we're stuck in a bad state indefinitely

Done.

Copy link
Collaborator

@liferoad liferoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liferoad
Copy link
Collaborator

Minor: can we update the changes.md regarding to this fix?

@shunping shunping force-pushed the gcs-bucket-rentention-policy branch from 62c7167 to 266f63f Compare May 17, 2024 18:57
@shunping shunping force-pushed the gcs-bucket-rentention-policy branch from 266f63f to 6601e2c Compare May 17, 2024 18:58
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge once checks pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants