Skip to content

Fix BigQuery predicate pushdown for varchars with backslashes or newlines#14254

Merged
findepi merged 6 commits intomasterfrom
findepi/bgtf
Sep 23, 2022
Merged

Fix BigQuery predicate pushdown for varchars with backslashes or newlines#14254
findepi merged 6 commits intomasterfrom
findepi/bgtf

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Sep 22, 2022

Fixes tests failing on master since #14125 (past 6 days)

Fixes #7900
Follows #7869

@findepi findepi requested a review from hashhar September 22, 2022 12:38
@cla-bot cla-bot bot added the cla-signed label Sep 22, 2022
Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

We don't have explicit tests for some escape sequences but looking at the table on https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#string_and_bytes_literals it seems like escaping \ properly and handling embedded newlines should cover most cases.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 22, 2022

I don't think it's readonable to test all possible BigQuery sequences, because we actually don't generate them.
We escape the string so that there are no (or -- as few as possible) sequences.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 22, 2022

CI

Error:  io.trino.plugin.bigquery.TestBigQueryConnectorTest.testInsertUnicode  Time elapsed: 1.282 s  <<< FAILURE!
io.trino.testing.QueryFailedException: The service is currently unavailable.
	at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:123)
	at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:480)
	at io.trino.testing.QueryAssertions.assertUpdate(QueryAssertions.java:71)
	at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:323)
	at io.trino.testing.AbstractTestQueryFramework.assertUpdate(AbstractTestQueryFramework.java:318)
	at io.trino.testing.BaseConnectorTest.testInsertUnicode(BaseConnectorTest.java:2911)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
	Suppressed: java.lang.Exception: SQL: INSERT INTO test_insert_unicode_1fstiydkrw(test) VALUES 'aa', 'bé'
		at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:483)
		... 17 more
Caused by: com.google.cloud.bigquery.BigQueryException: The service is currently unavailable.
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.translate(HttpBigQueryRpc.java:115)
	at com.google.cloud.bigquery.spi.v2.HttpBigQueryRpc.insertAll(HttpBigQueryRpc.java:494)
	at com.google.cloud.bigquery.BigQueryImpl.insertAll(BigQueryImpl.java:1074)
	at io.trino.plugin.bigquery.BigQueryClient.insert(BigQueryClient.java:301)
	at io.trino.plugin.bigquery.BigQueryPageSink.appendPage(BigQueryPageSink.java:65)
	at io.trino.operator.TableWriterOperator.addInput(TableWriterOperator.java:257)
	at io.trino.operator.Driver.processInternal(Driver.java:416)
	at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
	at io.trino.operator.Driver.tryWithLock(Driver.java:706)
	at io.trino.operator.Driver.process(Driver.java:306)
	at io.trino.operator.Driver.processForDuration(Driver.java:277)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
	at io.trino.$gen.Trino_testversion____20220922_135336_1.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
POST https://www.googleapis.com/bigquery/v2/projects/sep-bq-cicd/datasets/tpch/tables/test_insert_unicode_1fstiydkrw/insertAll?prettyPrint=false
{
  "code" : 503,
  "errors" : [ {
    "domain" : "global",
    "message" : "The service is currently unavailable.",
    "reason" : "backendError"
  } ],
  "message" : "The service is currently unavailable.",
  "status" : "UNAVAILABLE"
}

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Sep 22, 2022

don't think it's readonable to test all possible BigQuery sequences, because we actually don't generate them.
We escape the string so that there are no (or -- as few as possible) sequences.

That's what I meant by we should be fine as long as we escape quotes and backslash properly.

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 23, 2022

pt (hdp3, suite-azure, )

testing/bin/ptl failed to print the project version due to maven central communication issue

test (plugin/trino-iceberg)

looks like failure is related

Backslashes require escaping, since they are used to escape apostrophes
(and there are also other scape sequences).

This is test-covered by an existing test, that's currently failing on
master.
BigQuery string literals cannot contain new lines, require escape sequence.
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 23, 2022

(just rebased)

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 23, 2022

BaseIcebergConnectorTest#testSplitPruningForFilterOnNonPartitionColumn piggy-backs on testDataMappingSmokeTestDataProvider and requires values to be shorter than 16 characters, since ORC and Parquet predicates have min/max truncated in Iceberg.
I've reduce the length

cc @alexjo2144

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 23, 2022

redis failure seems unrelated

Error:  Failures: 
Error:    TestMinimalFunctionality.testStringValueWhereClauseHasData:88 Iterators differ at element [0]: [2] != [0] expected [[2]] but found [[0]]
Error:    TestMinimalFunctionality.testTableHasData:72 Iterators differ at element [0]: [1000] != [0] expected [[1000]] but found [[0]]

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Sep 23, 2022

CI #14035

@findepi findepi merged commit 27ee892 into master Sep 23, 2022
@findepi findepi deleted the findepi/bgtf branch September 23, 2022 14:28
@findepi findepi mentioned this pull request Sep 23, 2022
@github-actions github-actions bot added this to the 398 milestone Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Add support for all the String and Bytes Literals in BQ Connector

2 participants