Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidProtocolBufferException with complex queries #31

Closed
goktugkose opened this issue Nov 24, 2023 · 1 comment · Fixed by #37
Closed

InvalidProtocolBufferException with complex queries #31

goktugkose opened this issue Nov 24, 2023 · 1 comment · Fixed by #37

Comments

@goktugkose
Copy link

goktugkose commented Nov 24, 2023

Hi,
We are trying to employ ZetaSQL Toolkit for our SQL-based lineage and decision-making processes. However, with some complex queries (around 20 temp tables and a CTAS script that uses all of the temp tables), we encounter an error that prevents us from properly parsing some of the queries that we use. The code lines below represent a simplified version of our implementation in Java. During execution, the program throws the exception below.

It seems that the next() function in the AnalyzedStatement class analyzes the query while creating a communication channel between the client and BigQuery itself. It also seems that the CodedInputStream class in the protobuf-java library configures the limits of this communication channel. According to the error message that we got, the default value of defaultRecursionLimit which is 100, is not sufficient for our case. Could you help us to deal with this issue?

ZetaSQLToolkitAnalyzer analyzer = new ZetaSQLToolkitAnalyzer(options);
Iterator<AnalyzedStatement> it = analyzer.analyzeStatements(query, catalog);
while(it.hasNext()){
    ASTNodes.ASTStatement s = it.next().getParsedStatement();
}
Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
...
Caused by: com.google.zetasql.io.grpc.StatusRuntimeException: INTERNAL: Invalid protobuf byte sequence
...
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message had too many levels of nesting.  
May be malicious.  Use CodedInputStream.setRecursionLimit() to increase the depth limit.
ppaglilla added a commit that referenced this issue May 5, 2024
ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation of ZetaSQL. By default, the serialization logic of that communication allows for a nesting depth in protobuf messages of up to 100. However, long queries can exceed that level of nesting and as a result cannot be analyzed by default.

This implements a reflection-based patch that allows users to override
that limit to a greater number. This is brittle by design and should
be used with caution.

Fixes #31
ppaglilla added a commit that referenced this issue May 5, 2024
ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation of ZetaSQL. By default, the serialization logic of that communication allows for a nesting depth in protobuf messages of up to 100. However, long queries can exceed that level of nesting and as a result cannot be analyzed by default.

This implements a reflection-based patch that allows users to override
that limit to a greater number. This is brittle by design and should
be used with caution.

Fixes #31
ppaglilla added a commit that referenced this issue May 5, 2024
ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation of ZetaSQL. By default, the serialization logic of that communication allows for a nesting depth in protobuf messages of up to 100. However, long queries can exceed that level of nesting and as a result cannot be analyzed by default.

This implements a reflection-based patch that allows users to override
that limit to a greater number. This is brittle by design and should
be used with caution.

Fixes #31
@ppaglilla ppaglilla mentioned this issue May 6, 2024
@ppaglilla
Copy link
Collaborator

ppaglilla commented May 6, 2024

We implemented a workaround for this issue in release 0.5.0. TLDR, call the following method before performing any analysis using ZetaSQL or the ZetaSQL Toolkit. This example sets the max nesting depth limit to 1000 (the default being 100).

ZetaSQLPatcher.patchMaxProtobufNestingDepth(1000);

For context, this happens because ZetaSQL's Java API uses a GRPC service to call into the actual C++ implementation. By default, grpc-java sets the max nesting depth for protos to 100 levels. Support for increasing that limit is experimental in grpc-java (source, tacking issue) and thus ZetaSQL compiles its GRPC client with the default limit of 100.

This feature of the toolkit monkey patches the ZetaSQL's GRPC client and modifies that limit to the value provided by the caller. Setting the limit to 1000 seems like a reasonable target and should be able to parse and analyze queries much larger than before.

However, do use this with caution and only when needed. This uses reflection to modify the internals of ZetaSQL's GRPC client at runtime. The approach is brittle by design and could potentially break in a later release. Consider the feature as experimental. In the future, once increasing the limit isn't experimental in grpc-java, it might be increased to a more sensible value in ZetaSQL itself and we might be able to drop this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants