Skip to content

Make mapping of Trino schema to BigQuery dataset more explicit#19860

Merged
wendigo merged 2 commits intotrinodb:masterfrom
nineinchnick:bigquery-metadata-refactor
Dec 1, 2023
Merged

Make mapping of Trino schema to BigQuery dataset more explicit#19860
wendigo merged 2 commits intotrinodb:masterfrom
nineinchnick:bigquery-metadata-refactor

Conversation

@nineinchnick
Copy link
Copy Markdown
Member

Description

When reading through the BigQuery metadata class, it's hard to spot where exactly we map Trino schemas to BigQuery datasets, especially with the code that disambiguates them. This refactor is supposed to make it more explicit.

Additional context and related issues

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Nov 22, 2023
@nineinchnick nineinchnick requested a review from wendigo November 22, 2023 11:27
@github-actions github-actions bot added the bigquery BigQuery connector label Nov 22, 2023
@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Nov 23, 2023

@hashhar can you run this with secrets?

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Nov 23, 2023

/test-with-secrets sha=dde85321ffea09243d75b078e0798e6b89d9e7d2

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/6969121963

@nineinchnick nineinchnick force-pushed the bigquery-metadata-refactor branch from dde8532 to 6963392 Compare November 24, 2023 10:37
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Nov 27, 2023

/test-with-secrets sha=696339208803d79d246f771d551d2eece81aea58

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/7004090940

Comment on lines 234 to 237
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little wondering why we need this method. There are still many DatasetId.getDataset usages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a single use of this method when we translate BigQuery datasets into Trino schemas. In all other places, we receive a Trino schema and do a reverse translation.

Copy link
Copy Markdown
Member

@ebyhr ebyhr Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project-id is retrieved from client and passed to client.getDataset method. How about changing getDataset's method argument for taking only remoteSchema? Same for toRemoteDataset method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DatasetId uniquely identifies a dataset, so if client.getDataset receive it and use a different project ID (uses only part of the composite key) it would be an incorrect behavior which only coincidentally works because we always use the same project ID here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projectId is always the same between if and else, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be

Jan Waś added 2 commits November 28, 2023 09:01
This is only a refactor to make it easier to spot when Trino schemas are
mapped to BigQuery datasets and the other way around.
@nineinchnick nineinchnick force-pushed the bigquery-metadata-refactor branch from 6963392 to b903a2a Compare November 28, 2023 08:02
@nineinchnick nineinchnick requested a review from wendigo December 1, 2023 10:26
@nineinchnick
Copy link
Copy Markdown
Member Author

@wendigo @hashhar PTAL again

@wendigo wendigo merged commit 1b15b1d into trinodb:master Dec 1, 2023
@github-actions github-actions bot added this to the 435 milestone Dec 1, 2023
@nineinchnick nineinchnick deleted the bigquery-metadata-refactor branch December 1, 2023 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bigquery BigQuery connector cla-signed

Development

Successfully merging this pull request may close these issues.

4 participants