Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Improve lifecycle management of the artifact storage migration #1146

Merged
merged 19 commits into from
Apr 1, 2023

Conversation

kenxu95
Copy link
Contributor

@kenxu95 kenxu95 commented Mar 28, 2023

Lead reviewers: @saurav-c and @likawind.

Describe your changes and why you are making these changes

This is a WIP process PR, but the critical backend parts have been changed and tested. I would like some preliminary feedback on 1) the data model and 2) the updated logic in connect_integration.go to make sure there are no surprises when the rest of the work is fleshed out.

Testing:

  • Tested that I'm able to migration to an S3 bucket, and from that S3 bucket to another S3 bucket, and that the storage migration table is in the appropriate state afterwards.

I also updated configure_storage.go later following the same pattern. Once the latter is updated we'll be able to migrate from S3 back to local.

There will be a follow-up for the UI changes, as well as any additional backend routes we need to support those UI changes.

Known issues that I will address (or make tasks for):

  • We should perform the deletion of contents in the old storage layer AFTER updating the global storage config. Otherwise, this can lead to an inconsistent and broken state. It's not a huge deal if we accidentally leave some files behind.
  • There's currently no way of testing s3 -> local filesystem migration through the UI or SDK? (Task created)
  • We should probably lock the storage_migration table so that only one migration can be ongoing at a time. (Task created)
  • There should be an enforced timeout for the migration process. (Task created)
  • Disallow deleting an integration that is also being used as the current metadata store.
  • [Nice-to-have] Migrating to the same S3 bucket, or migrating local->local should error. This currently works for some reason, but we can probably disallow it earlier.

Related issue number (if any)

ENG-2604

Loom demo (if any)

https://www.loom.com/share/e04f228644f24c2eb0b0b583503c8abd

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@kenxu95 kenxu95 requested review from cw75, likawind and saurav-c March 28, 2023 22:16
@kenxu95 kenxu95 force-pushed the eng-2604-s3-migration-error-failing-should-not branch from f4de02c to ef87ff7 Compare March 29, 2023 23:18
@kenxu95 kenxu95 force-pushed the eng-2604-s3-migration-error-failing-should-not branch from ef87ff7 to 770ed7d Compare March 30, 2023 00:25
Copy link
Contributor

@likawind likawind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left a few comments

@@ -12,6 +12,11 @@ const (
IntegrationConfigHeader = "integration-config"
IntegrationIDsHeader = "integration-ids"

// Storage Migration Headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on some other APIs like https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dags , I'd suggest using request body to send / capture a json-serialized query parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these are json'd though?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you are right, but it looks like they are query parameters (/storage-migrations?limit=5) tough?

cc @jpurusho65 as you may have better experience on API pattern

@@ -0,0 +1,3 @@
package tests

// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming this is an immediate TODO in follow-up PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this as part of this PR.


// Apply the filters in the following order: status, completedSince, then limit.
if status != nil {
migrations = filterStorageMigrationsByStatus(migrations, *status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how hard is it to push these filtering / limit to query layer? I don't have particular concerns on this use case, but if it's something easily built, we can avoid potential pitfalls in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too hard but it is a little tricky and I felt like there probably aren't enough storage migrations for listing them all to matter.

src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
}

// Only need to do this if we're committing a transaction.
if txn, ok := db.(database.Transaction); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this block do? It feels like an antipattern to me - all transaction owner (whoever calls txn := db.BeginTx()) should be responsible for calling Commit() and TxnRollBack(). I'm confused why we need to access some transaction this helper didn't create itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing, I agree. Refactored it so that the transaction definition happens at the top of the function.

Copy link
Contributor

@saurav-c saurav-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left initial comments, still working through the rest of the PR.

src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
@kenxu95 kenxu95 changed the title [Feedback Requested][Backend] Improve lifecycle management of the S3 migration [Backend] Improve lifecycle management of the S3 migration Mar 31, 2023
@kenxu95 kenxu95 changed the title [Backend] Improve lifecycle management of the S3 migration [Backend] Improve lifecycle management of the artifact storage migration Mar 31, 2023
@kenxu95 kenxu95 requested review from likawind and saurav-c March 31, 2023 18:39
@kenxu95
Copy link
Contributor Author

kenxu95 commented Mar 31, 2023

@saurav-c @likawind ok this PR should have all the components ready now. The UI part is here: #1155

Copy link
Contributor

@likawind likawind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments, nothing major from my end!

src/golang/cmd/server/handler/connect_integration.go Outdated Show resolved Hide resolved
Copy link
Contributor

@saurav-c saurav-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple minor comments, otherwise LGTM

src/golang/cmd/server/handler/get_config.go Outdated Show resolved Hide resolved
*aq_context.AqContext

// See the route description above for what each of these filters mean.
status *string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a pointer, same with completedSince?

Copy link
Contributor Author

@kenxu95 kenxu95 Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are optional fields that will be left as nil if the request doesn't want to apply those filters. I'll document it here better.

@kenxu95 kenxu95 added run_integration_test Triggers integration tests skip_integration_test Skips required integration test (only documentation/UI changes) and removed run_integration_test Triggers integration tests labels Apr 1, 2023
@kenxu95
Copy link
Contributor Author

kenxu95 commented Apr 1, 2023

Pre-merge tests already ran, so skipping them when merging in UI changes.

@kenxu95 kenxu95 merged commit a32206c into main Apr 1, 2023
@kenxu95 kenxu95 deleted the eng-2604-s3-migration-error-failing-should-not branch April 1, 2023 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip_integration_test Skips required integration test (only documentation/UI changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants