Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glue tables stack #16

Merged
merged 3 commits into from
Oct 6, 2021
Merged

Add glue tables stack #16

merged 3 commits into from
Oct 6, 2021

Conversation

tthyer
Copy link
Contributor

@tthyer tthyer commented Oct 2, 2021

This adds the tables stack; it probably needs to be updated. @philerooski Can you confirm which of the tables in sandbox are latest? Should I update from the tables in mtb_construct database?

@tthyer tthyer temporarily deployed to develop October 2, 2021 00:56 Inactive
@tthyer tthyer force-pushed the ETL-67/glue-tables branch from c69cd0a to 2eca073 Compare October 2, 2021 04:20
@tthyer tthyer temporarily deployed to develop October 2, 2021 04:21 Inactive
@tthyer tthyer temporarily deployed to develop October 2, 2021 04:21 Inactive
Copy link
Member

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the schema changes, will these have to change too?

@philerooski
Copy link
Contributor

The mtb_construct database contains the most up to date schemas. I don't expect there are any changes from what is in phil_sample_db, though. The tables in mtb_construct are more comprehensive in the sense that I crawled files coming from multiple types of assessments (not just number match) to construct the schemas. But since schemas shouldn't be changing from assessment to assessment it makes no difference.

One important thing to keep in mind is that -- as part of the s3_to_json job/workflow -- all s3 object metadata is inserted into the metadata file. Hence all those fields should be represented in the metadata glue table schema. We don't yet have a clear picture of how that metadata will look. See https://sagebionetworks.jira.com/wiki/spaces/MTB/pages/2458681562/MTB+metadata+upload+for+Bridge+Exporter+3.0

@tthyer tthyer temporarily deployed to develop October 6, 2021 19:56 Inactive
@tthyer tthyer temporarily deployed to develop October 6, 2021 19:56 Inactive
@tthyer tthyer temporarily deployed to develop October 6, 2021 20:50 Inactive
@tthyer tthyer temporarily deployed to develop October 6, 2021 20:50 Inactive
@tthyer
Copy link
Contributor Author

tthyer commented Oct 6, 2021

@philerooski fyi, I had to update the tables after comparing the old schema and the mtb_construct table schemas

@tthyer tthyer merged commit 3e71abd into main Oct 6, 2021
@tthyer tthyer deleted the ETL-67/glue-tables branch October 6, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants