-
Notifications
You must be signed in to change notification settings - Fork 3k
GCP: Add Iceberg Catalog for GCP BigQuery Metastore #11039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
356d34e
GCP: Add Iceberg Catalog for GCP BigQuery Metastore
hesham-medhat 3d88af9
Ensure unit tests write temp folders under gradle's build directory.
hesham-medhat 49de9cb
Reorganize to create iceberg-bigquery standalone project
hesham-medhat 196975a
Apply style of vertical whitespace after control flow blocks
hesham-medhat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
129 changes: 129 additions & 0 deletions
129
bigquery/src/main/java/org/apache/iceberg/gcp/bigquery/BigQueryClient.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
| package org.apache.iceberg.gcp.bigquery; | ||
|
|
||
| import com.google.api.services.bigquery.model.Dataset; | ||
| import com.google.api.services.bigquery.model.DatasetList.Datasets; | ||
| import com.google.api.services.bigquery.model.DatasetReference; | ||
| import com.google.api.services.bigquery.model.Table; | ||
| import com.google.api.services.bigquery.model.TableList.Tables; | ||
| import com.google.api.services.bigquery.model.TableReference; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
|
|
||
| /** | ||
| * A client of Google BigQuery Metastore functions over the BigQuery service. Uses the Google | ||
| * BigQuery API. | ||
| */ | ||
| public interface BigQueryClient { | ||
|
|
||
| /** | ||
| * Creates and returns a new dataset. | ||
| * | ||
| * @param dataset the dataset to create | ||
| */ | ||
| Dataset createDataset(Dataset dataset); | ||
|
|
||
| /** | ||
| * Returns a dataset. | ||
| * | ||
| * @param datasetReference full dataset reference | ||
| */ | ||
| Dataset getDataset(DatasetReference datasetReference); | ||
|
|
||
| /** | ||
| * Deletes a dataset. | ||
| * | ||
| * @param datasetReference full dataset reference | ||
| */ | ||
| void deleteDataset(DatasetReference datasetReference); | ||
|
|
||
| /** | ||
| * Updates parameters of a dataset or adds them if did not exist, leaving already-existing ones | ||
| * intact. | ||
| * | ||
| * @param datasetReference full dataset reference | ||
| * @param parameters metadata parameters to add | ||
| * @return dataset after patch | ||
| */ | ||
| Dataset setDatasetParameters(DatasetReference datasetReference, Map<String, String> parameters); | ||
|
|
||
| /** | ||
| * Removes given set of keys of parameters of a dataset. Ignores keys that do not exist already. | ||
| * | ||
| * @param datasetReference full dataset reference | ||
| * @param parameters metadata parameter keys to remove | ||
| * @return dataset after patch | ||
| */ | ||
| Dataset removeDatasetParameters(DatasetReference datasetReference, Set<String> parameters); | ||
|
|
||
| /** | ||
| * Lists datasets under a given project | ||
| * | ||
| * @param projectId the identifier of the project to list datasets under | ||
| */ | ||
| List<Datasets> listDatasets(String projectId); | ||
|
|
||
| /** | ||
| * Creates and returns a new table. | ||
| * | ||
| * @param table body of the table to create | ||
| */ | ||
| Table createTable(Table table); | ||
|
|
||
| /** | ||
| * Returns a table. | ||
| * | ||
| * @param tableReference full table reference | ||
| */ | ||
| Table getTable(TableReference tableReference); | ||
|
|
||
| /** | ||
| * Updates the catalog table options of an Iceberg table and returns the updated table. | ||
| * | ||
| * @param tableReference full table reference | ||
| * @param table to patch | ||
| */ | ||
| Table patchTable(TableReference tableReference, Table table); | ||
|
|
||
| /** | ||
| * Renames a table. | ||
| * | ||
| * @param tableToRename full table reference | ||
| * @param newTableId new table identifier | ||
| */ | ||
| Table renameTable(TableReference tableToRename, String newTableId); | ||
|
|
||
| /** | ||
| * Deletes a table. | ||
| * | ||
| * @param tableReference full table reference | ||
| */ | ||
| void deleteTable(TableReference tableReference); | ||
|
|
||
| /** | ||
| * Returns all tables in a database. | ||
| * | ||
| * @param datasetReference full dataset reference | ||
| * @param filterUnsupportedTables if true, fetches every item on the list to verify it is | ||
| * supported, in order to filter the list from unsupported Iceberg tables | ||
| */ | ||
| List<Tables> listTables(DatasetReference datasetReference, boolean filterUnsupportedTables); | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there an interface for
BigQueryClient? My assumption was that there would be a test implementation, butBigQueryClientImplTestuses aBigQueryClientImpland injects a mockBigquery clientvia the package-private constructor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the catalog tests use a mock BigQueryClient, while the tests for BigQueryClient use a mock Bigquery. Is this necessary? It seems to me that this sets up an unnecessary test surface. For instance, to delete a table, the
BigQueryClientImplTestvalidates thatclient.deleteTablecalls a mockedDelete.executeUnparsedmethod once. ThedropTabletest then just tests for side-effects (directory is present, directory is not present).I don't think that this structure is necessary because the drop table test could validate that the underlying
Delete.executeUnparsedis called. This interface seems unnecssary.After looking at the test structure here, I think the bigger problem is that these tests don't really ensure behavior and will break if the implementation changes. Using the
dropTabletest as an example again, it first loads the table and then calls the client to delete it. An arguably better implementation would simply calldeleteTableand catch theNoSuchTableExceptionand returnfalse. If you made that change, the mock tests would break. Another problem is thatdeleteTablealso loads the table by callinggetTable, which is hidden by the structure of the tests.I think that this should be tested like other implementations and needs to use
CatalogTests. That will test the behavior. You could potentially do that using aBigquerymock, but I think the way mocks are used in this PR is very specific to the implementation, so I would prefer just having a fake in-memory implementation of theBigqueryinterface instead.I would keep in mind that the goal of these tests isn't to test that
dropTableis translated intogetTablefollowed bydeleteTable. If tests are validating a list of forwarded calls (at two different levels) then you aren't ensuring that the behavior of the actual catalog is correct.