Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUMULUS-3172]: Update RDS migration docs #3385

Closed

Conversation

npauzenga
Copy link
Contributor

This adds a note to the data-migration2 documentation calling out the implication of a Granule's status and how it impacts migration.

@@ -122,6 +122,16 @@ Please read this entire section thoroughly before proceeding to run the second d

:::

:::note

The data migration Lambda described here uses Cumulus' Granule `upsert` logic to write granules from DynamoDB to PostgreSQL. This is particularly notable because granules with a `running` or `queued` status will only migrate a subset of their fields. See the Granule `upsert` logic [here](https://github.com/nasa/cumulus/blob/release-15.0.x/packages/db/src/models/granule.ts#L128).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Probably should explicitly note this is referring to data-migration-2 somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -122,6 +122,16 @@ Please read this entire section thoroughly before proceeding to run the second d

:::

:::note

The data migration Lambda described here uses Cumulus' Granule `upsert` logic to write granules from DynamoDB to PostgreSQL. This is particularly notable because granules with a `running` or `queued` status will only migrate a subset of their fields. See the Granule `upsert` logic [here](https://github.com/nasa/cumulus/blob/release-15.0.x/packages/db/src/models/granule.ts#L128).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: link to source code is good, but given we're referencing a specific version, we should just list out the fields here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense f1c6413


The data migration Lambda described here uses Cumulus' Granule `upsert` logic to write granules from DynamoDB to PostgreSQL. This is particularly notable because granules with a `running` or `queued` status will only migrate a subset of their fields. See the Granule `upsert` logic [here](https://github.com/nasa/cumulus/blob/release-15.0.x/packages/db/src/models/granule.ts#L128).

It is recommended that users ensure their granules are in the correct state before running this data migration. If there are Granules with an incorrect status, it will impact the data migration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, at least in abstract, cover how they might do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sure, fair point as it might be a common question. So how do they do this? 😄

I suppose the first step would be finding failed granule writes, figure out why they're failing, fix, and re-run ingest on those granules. That should update the status. It seems like that's the preferred approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think a reference to the dead letter archive here is appropriate, and possibly a mention of using the API to update the status if the record is otherwise correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bit of extra instruction here. It's in a couple commits so here's the most recent full-file: https://github.com/nasa/cumulus/pull/3385/files#diff-4415dfc379dabbb1814e1eed85bf7b84034887b43fa4c197227678aa1b471802

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So.... the thing is it's possible there are failures that aren't the DLA, I think we need to soften the language here, while I would expect most production environments with granules in a running/queued state indicate the workflow didn't complete, it's entirely possible other situations might cause a granule message to not make it to the end of a workflow such that it can be written.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Good catch. We don't want to promise this is a catch-all.

03387d5

Copy link
Member

@Jkovarik Jkovarik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of nits and a comment.

@npauzenga
Copy link
Contributor Author

Closing in favor of #3387

@npauzenga npauzenga closed this May 18, 2023
@npauzenga npauzenga deleted the feature/CUMULUS-3172-update-rds-migration-docs branch May 18, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants