Support CREATE TABLE AS SELECT and INSERT in BigQuery#13094
Conversation
|
I'm concerned about As stated in the docs: |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryInsertTableHandle.java
Outdated
Show resolved
Hide resolved
| public ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional<ConnectorTableLayout> layout, RetryMode retryMode) | ||
| { | ||
| if (tableMetadata.getComment().isPresent()) { | ||
| throw new TrinoException(NOT_SUPPORTED, "This connector does not support creating tables with table comment"); |
There was a problem hiding this comment.
is it bigquery limitation or we just don't support it yet?
There was a problem hiding this comment.
It's just connector's limitation.
| if (isWildcardTable(TableDefinition.Type.valueOf(table.getType()), table.getRemoteTableName().getTableName())) { | ||
| throw new TrinoException(BIGQUERY_UNSUPPORTED_OPERATION, "This connector does not support inserting into wildcard tables"); | ||
| } | ||
| List<String> columnNames = columns.stream().map(column -> ((BigQueryColumnHandle) column).getName()).collect(toImmutableList()); |
There was a problem hiding this comment.
JDK 17 has .toList() on a Stream<> which according to Javadoc returns an unmodifiable list. Should we use it @martint ?
For reference: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html#toList()
There was a problem hiding this comment.
unmodifiable list and immutablelist semantics are different. null hostility being the big one and unmodifiable isn't really unmodifiable if someone else holds a reference.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryOutputTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryPageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryPageSink.java
Outdated
Show resolved
Hide resolved
| return StandardSQLTypeName.TIMESTAMP; | ||
| } | ||
| if (type instanceof CharType || type instanceof VarcharType) { | ||
| if (type instanceof VarcharType) { |
There was a problem hiding this comment.
It was required to fix incorrect predicates between char vs varchar if I remember correctly.
There was a problem hiding this comment.
What is CharType mapped to now? I don't see explicit mapping so does it mean it's unsupported type now?
There was a problem hiding this comment.
It's mapped to BigQuery STRING type now.
There was a problem hiding this comment.
I'm probably being stupid but I can't find code which handles that (mapping Trino char to BigQuery STRING). Can you point me to it?
I'm probably confused because of removal of char from createTableSupportedTypes in TestBigQueryConnectorTests.
There was a problem hiding this comment.
Sorry, I misunderstood "now" as "before this change". I wanted to say:
CHARwas mapped toSTRINGbefore this PRCHARis unsupported after this PR
There was a problem hiding this comment.
Sorry for causing confusion. 😄 Do we want to mention this in release notes? The only possible impact I see is some CTAS statements from other catalogs might fail but easy to preserve old "incorrect" behaviour by casting CHAR columns to VARCHAR instead.
wendigo
left a comment
There was a problem hiding this comment.
Overall design looks good. We need to improve on type coverage and error handling.
bf0ea16 to
2ba1e4d
Compare
23fff3c to
d01df23
Compare
|
@hashhar Could you please review this PR when you have time? |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Should this be SchemaTableName or the remote mapped names?
There was a problem hiding this comment.
I'm slightly confused now. I see that you didn't have to add any additional code to convert name to remote names. Is it because the remote name logic is contained within BigQueryClient?
If that is the case then your older naming was more correct since the handle will include "Trino" names instead of "remote" names and the BigQueryClient will handle the conversions.
There was a problem hiding this comment.
It should be "remote", but a logic to lookup a remote schema name was missing. Could you take a look at https://github.com/trinodb/trino/compare/28038ec2b874963631ae476ca1b9cdcef36e0462..cac3a485decbaebdada4a356c0b50bab68ca34b7?
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeUtils.java
Outdated
Show resolved
Hide resolved
hashhar
left a comment
There was a problem hiding this comment.
I still need to look at type mapping tests.
Also we should add tests for inserts into TestBigQueryCaseInsensitiveMapping (and also for JDBC connectors but that's separate PR).
|
@hashhar Addressed comments. |
There was a problem hiding this comment.
Thank you for catching this. Was this caught by some existing test?
| return StandardSQLTypeName.TIMESTAMP; | ||
| } | ||
| if (type instanceof CharType || type instanceof VarcharType) { | ||
| if (type instanceof VarcharType) { |
There was a problem hiding this comment.
I'm probably being stupid but I can't find code which handles that (mapping Trino char to BigQuery STRING). Can you point me to it?
I'm probably confused because of removal of char from createTableSupportedTypes in TestBigQueryConnectorTests.
28038ec to
cac3a48
Compare
hashhar
left a comment
There was a problem hiding this comment.
Looks good. Thanks.
I'll revisit type mapping and approve.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Probably we should also call this in createSchema to avoid creating schemas in BigQuery which differ only in case. Separate and pre-existing issue however.
There was a problem hiding this comment.
Turns out this is not needed/the wrong thing to do. When creating schemas we already list them and the listing will include the lowercase name so Trino will see a schema already exists - I've added a test to verify this similar to what you added for DROP SCHEMA in #13812
They are different types, so we shouldn't treat as same type.
* Rename the test method to testInt64 from testInteger * Fix data provider and reorder the entries
cac3a48 to
8d17f02
Compare
|
Rebased on upstream to resolve conflicts. |
hashhar
left a comment
There was a problem hiding this comment.
LGTM - a question for my understanding.
| "SELECT 1234567890, 1.23", | ||
| "SELECT count(*) + 1 FROM customer"); | ||
|
|
||
| // TODO: BigQuery throws table not found at BigQueryClient.insert if we reuse the same table name |
There was a problem hiding this comment.
Can you explain a bit more to me? Does this mean that when the DROP TABLE above runs and then we try to CREATE TABLE with same name again here it fails?
There was a problem hiding this comment.
Does this mean that when the DROP TABLE above runs and then we try to CREATE TABLE with same name again here it fails?
Yes. Also, the issue doesn't happen in case of empty tables. It seems there's cache or something for BigQuery.insertAll. I couldn't find relevant options in the library at glance. I will look into the detail later.
8d17f02 to
6633525
Compare
hashhar
left a comment
There was a problem hiding this comment.
LGTM. Thanks for working on this.
| @Override | ||
| public ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional<ConnectorTableLayout> layout, RetryMode retryMode) | ||
| { | ||
| return createTable(session, tableMetadata); |
There was a problem hiding this comment.
Does this also need a defensive check to verify query retries are not enabled?
There was a problem hiding this comment.
I think so. Thanks for catching that. Do you want to send the PR?
| ----------- | ||
|
|
||
| The connector provides read and write access to data and metadata in the | ||
| BigQuery database, though write access is limited. In addition to the |
Description
Support
CREATE TABLE AS SELECTandINSERTin BigQueryFixes #6868
Fixes #6869
Documentation
(x) Sufficient documentation is included in this PR.
Release notes
(x) Release notes entries required with the following suggested text: