Prevent Kafka protobuf schema registry message parsing dead loop recursion#15312
Prevent Kafka protobuf schema registry message parsing dead loop recursion#15312mx123 wants to merge 24 commits intotrinodb:masterfrom
Conversation
.../trino-kafka/src/test/java/io/trino/plugin/kafka/protobuf/TestingUnsupportedNestedTypes.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/cost/ValuesStatsRule.java
Outdated
Show resolved
Hide resolved
...n/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufSchemaParser.java
Outdated
Show resolved
Hide resolved
...n/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufSchemaParser.java
Outdated
Show resolved
Hide resolved
880535a to
56ffe83
Compare
.../io/trino/plugin/kafka/protobuf/TestKafkaProtobufWithSchemaRegistryMinimalFunctionality.java
Outdated
Show resolved
Hide resolved
.../io/trino/plugin/kafka/protobuf/TestKafkaProtobufWithSchemaRegistryMinimalFunctionality.java
Outdated
Show resolved
Hide resolved
...n/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufSchemaParser.java
Outdated
Show resolved
Hide resolved
5420cd6 to
8291c7d
Compare
plugin/trino-kafka/src/test/resources/protobuf/google/protobuf/timestamp.proto
Outdated
Show resolved
Hide resolved
...n/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufSchemaParser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can we have them in protobuf directory so that all the protobuf files would be in one directory ?
There was a problem hiding this comment.
Unfortunately, then generator will create objects for all proto files, but existing protos are intended to be used with DynamicMessage and these objects will be unneded/unused. On attempt to move existing protos to new folder unreachable for generator it will be need to refactor many reference places in few libs since those proto file location depends ProtobufUtils class package (since it's getting them by getResource, see for details). So that's looks like out of scope of the current issue.
There was a problem hiding this comment.
This will be solved if we parse the proto files and generate the DynamicMessage then we could have all the proto files in one directory.
There was a problem hiding this comment.
but we cannot due to 0710dce#r1043458738
There was a problem hiding this comment.
Is it possible for us to create the DynamicMessage via descriptor and use them to populate it ?
There was a problem hiding this comment.
Protobuf static generator is used to workaround of the PR unrelated issue, when the parsing by ProtobufUtils of the unsupported_nested.proto throws:
com.google.protobuf.Descriptors$DescriptorValidationException: io.trino.protobuf.schema.nested_value_one: "NestedValue" is not an enum type.
at com.google.protobuf.Descriptors$FieldDescriptor.crossLink(Descriptors.java:1587)
at com.google.protobuf.Descriptors$FieldDescriptor.access$1400(Descriptors.java:1057)
at com.google.protobuf.Descriptors$Descriptor.crossLink(Descriptors.java:1000)
at com.google.protobuf.Descriptors$Descriptor.access$1200(Descriptors.java:648)
at com.google.protobuf.Descriptors$FileDescriptor.crossLink(Descriptors.java:600)
at com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:321)
at com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:289)
at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:128)
at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:82)
at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:75)
....
lib/trino-record-decoder/src/main/java/io/trino/decoder/protobuf/ProtobufUtils.java
Outdated
Show resolved
Hide resolved
...n/trino-kafka/src/main/java/io/trino/plugin/kafka/encoder/protobuf/ProtobufSchemaParser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Now we have disabled this parsing so as we don't run into a infinite loop. Is it possible for us to fix this issue - if we have a complicated schema like this ?
There was a problem hiding this comment.
Generally the issue of sophisticated schema parsing is need to be address to presence of output format data structure. As i see Trino SQL data types have not nothing similar to describe this object relation format. So, in common case now the issue cannot be resolved. But if we have particular known to us object structure we can describe it with some limitation to nested self referenced objects inclusions. It will be similar with google.protobuf.Timestamp object parsing implementation .
91f48c2 to
6935b25
Compare
When refresh token is retrieved for UI, currently we were sending HTTP Status 303, assuming that all the request will just repeat the call on the Location header. When this works for GET/PUT verbs, it does not for non-idempotent ones like POST, as every js http client should do a GET on LOCATION after 303 on POST. Due to that I change it to 307, that should force every client to repeat exactly the same request, no matter the verb. Co-authored-by: s2lomon <s2lomon@gmail.com>
Actual work is done in `pageProjectWork.process()` call while `projection.project` only performs setup of projection. So both `expressionProfiler` and `metrics.recordProjectionTime` needed to be around that method.
Removes outdated comments and unnecessary methods in local exchange PartitioningExchanger since the operator is no longer implemented in a way that attempts to be thread-safe.
- Change ColumnHandle to BigQueryColumnHandle in BigQueryTableHandle - Extract buildColumnHandles in BigQueryClient
kokosing
left a comment
There was a problem hiding this comment.
Looks good to me. Just few comments.
.../io/trino/plugin/kafka/protobuf/TestKafkaProtobufWithSchemaRegistryMinimalFunctionality.java
Outdated
Show resolved
Hide resolved
plugin/trino-kafka/pom.xml
Outdated
There was a problem hiding this comment.
Why we are using this library? Why not simple HashSet? Is it about we are not removing items from it when we are closing the stack?
Is it possible to have a test for a case where using this library is needed (closing the recursion)? Is it possible (and reasonable) to not use this dependency?
There was a problem hiding this comment.
please see my explanation #15312 (comment)
Kafka protobuf registry schema cannot be translated to plain Trino SQL structure in common cases when included data types use referencing to same objects recursively.
Message sending to Kafka thows NPE if proto file has `import` derective.
Kafka protobuf schema registry parsing falls to dead loop if has included references to the same object. It's need to abort the recursion loop with appropriated parsing error.
6935b25 to
4cad364
Compare
|
@mx123 Can you please rebase? |
|
closed due to PR#15724 opened |
Description
In case when Kafka protobuf registry has included self referenced object types in schema it falls with stack overflow exception. To improve user experience the schema parsing is need to break with unsupported type that will be displayed to user in more meaningful manner.