Improve speed when listing columns in BigQuery#21920
Improve speed when listing columns in BigQuery#21920nineinchnick wants to merge 4 commits intotrinodb:masterfrom
Conversation
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/DecimalTypeInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
these methods are not needed, remove toString, equals and hashCode. toString is only used in test but that's redundant - we only need a single roundtrip: String -> TypeInfo
There was a problem hiding this comment.
toString is used in a few exceptions. I'll remove others.
There was a problem hiding this comment.
There was a problem hiding this comment.
However, the IllegalArgumentException will be suppressed in convertToTrinoType, right?
There was a problem hiding this comment.
Yes. I can either remove it, or we could add some logging, to make it easier to debug why some columns are not available in Trino. LMK what's better.
d3fba84 to
f43ccd6
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/type/TestTypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
f43ccd6 to
6b34692
Compare
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryType.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
a2bbea8 to
3654c2e
Compare
|
@ebyhr PTAL |
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/TestBigQueryType.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/BaseBigQueryConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why do we need to build DatasetId and call getDataset method (when is it different from remoteSchemaName)?
There was a problem hiding this comment.
I added it for consistency. This connector has a local to remote names mapping.
There was a problem hiding this comment.
I understand it's for extension. Such change should go to the internal repository.
There was a problem hiding this comment.
Then we'd have to revert #19860. It's out of the scope of this PR.
There was a problem hiding this comment.
No need to revert the PR. Just removing this line is sufficient.
There was a problem hiding this comment.
I don't understand, this is required for consistency. We either allow this local-to-remote name mapping, or not.
There was a problem hiding this comment.
Please take a look at listRelationCommentMetadata() method. Also, can you write a test that doesn't pass without this line if you still want to keep it? If not, please remove it.
this local-to-remote name mapping
Please explain when schemaName gets a different value from remoteSchemaName in this repository:
String schemaName = client.toSchemaName(DatasetId.of(projectId, remoteSchemaName));
...
protected String toSchemaName(DatasetId datasetId)
{
return datasetId.getDataset();
}
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryTypeManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/PrimitiveTypeInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
cc6a799 to
bf0cf78
Compare
|
/test-with-secrets sha=bf0cf785188ed0cca97f6b18f4f750df035ebe75 |
|
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9218809344 |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I understand it's for extension. Such change should go to the internal repository.
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/type/TypeInfoUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
However, the IllegalArgumentException will be suppressed in convertToTrinoType, right?
There was a problem hiding this comment.
This name looks little confusing because STRUCT type is supported except for parsing. How about renaming to ParsingException or something?
There was a problem hiding this comment.
It's not supported in the type package. ParsingException seems to be too broad, or I'd have to refactor the exception to use it instead of IllegalArgumentException and make the typeName parameter optional. I don't think it's worth doing.
There was a problem hiding this comment.
It's not supported in the type package
I know. However, the class is used from outside of the package (BigQueryTypeManager) either. The package is unclear when reading the code in the class.
There was a problem hiding this comment.
How can I make it more clear, using a fully qualified name, or with a comment?
There was a problem hiding this comment.
Why do we want to continue iteration even when table is empty?
There was a problem hiding this comment.
We only need the table to map STRUCT types. We should continue iteration to convert other types.
There was a problem hiding this comment.
I suppose table.isEmpty() means the table doesn't exist. No need to return columns in my opinion.
There was a problem hiding this comment.
It doesn't mean it doesn't exit, just it wasn't provided.
There was a problem hiding this comment.
No, the tableSupplier is called with getTable method. It returns an empty when the table doesn't exist:
plugin/trino-bigquery/src/test/java/io/trino/plugin/bigquery/type/TestTypeInfoUtils.java
Outdated
Show resolved
Hide resolved
bf0cf78 to
8c16327
Compare
| return RowType.from(fields); | ||
| } | ||
|
|
||
| public List<ColumnMetadata> convertToTrinoType(List<String> names, List<String> types) |
There was a problem hiding this comment.
This method is used only from tests. Remove.
| // ignore unsupported types | ||
| continue; | ||
| } | ||
| typeSignature = createRowType(table.get().getDefinition().getSchema().getFields().get(name)).getTypeSignature(); |
There was a problem hiding this comment.
Why we try to get typeSingature in case of unsupported type?
If such STRUCT type is valid maybe parseTypeString should be adapted to accept it?
| public final class BigDecimalTypeInfo | ||
| extends PrimitiveTypeInfo | ||
| { | ||
| private static final int MAX_PRECISION_MINUS_SCALE = 38; |
There was a problem hiding this comment.
Does it mean that max precision is 76?
Why not use just MAX_PRECISION constant?
| public final class DecimalTypeInfo | ||
| extends PrimitiveTypeInfo | ||
| { | ||
| private static final int MAX_PRECISION_MINUS_SCALE = 29; |
|
|
||
| private static class TypeInfoParser | ||
| { | ||
| public record Token(int position, String text, boolean type) |
There was a problem hiding this comment.
| public record Token(int position, String text, boolean type) | |
| public record Token(int position, String text, boolean charType) |
| int end = 1; | ||
| while (end <= typeInfoString.length()) { | ||
| if (end == typeInfoString.length() || | ||
| !isTypeChar(typeInfoString.charAt(end - 1)) || |
There was a problem hiding this comment.
If we check every char what is the condition that end - 1 will not be cached in previous loop iteration?
This comment was marked as outdated.
This comment was marked as outdated.
|
Hey @nineinchnick, will this be worked on in near future? |
|
@oskar-szwajkowski no, this has very low priority. I'll actually close this, to avoid further confusion. |
|
@oskar-szwajkowski Can you open a new PR? |
what would be motivation for that? I currently haven't spotted any downsides of getting column list in big query catalogs, that could impact workload that I looked at. Is this getting more priority and should be opened again? |
|
If you open a new PR based on this PR but from your end @oskar-szwajkowski, you can continue to drive updates yourself. @nineinchnick is not planning to work on this so if you are interested you can take over that way. At least I think that was the idea from @ebyhr ... |
Description
A continuation of #21830. Test coverage of the type parser is at 99% of lines and 95% of branches. Original description below.
Using mini parser because BigQuery Java SDK doesn't support translating string to BigQuery type as far as I asked Google engineers.
This PR improves listing columns. Test with 169 tables is improved from 22s to 1.5s.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: