Skip to content

Prevent Kafka protobuf schema registry message parsing dead loop recursion#15312

Closed
mx123 wants to merge 24 commits intotrinodb:masterfrom
mx123:kafka_protobuf_schema_registry_nested_objects
Closed

Prevent Kafka protobuf schema registry message parsing dead loop recursion#15312
mx123 wants to merge 24 commits intotrinodb:masterfrom
mx123:kafka_protobuf_schema_registry_nested_objects

Conversation

@mx123
Copy link
Copy Markdown
Contributor

@mx123 mx123 commented Dec 6, 2022

Description

In case when Kafka protobuf registry has included self referenced object types in schema it falls with stack overflow exception. To improve user experience the schema parsing is need to break with unsupported type that will be displayed to user in more meaningful manner.

@cla-bot cla-bot bot added the cla-signed label Dec 6, 2022
@mx123 mx123 changed the title Kafka protobuf schema registry nested objects Prevent Kafka protobuf schema registry message parsing dead loop recursion Dec 6, 2022
@mx123 mx123 requested review from Praveen2112 and kokosing December 6, 2022 09:42
@mx123 mx123 force-pushed the kafka_protobuf_schema_registry_nested_objects branch 4 times, most recently from 880535a to 56ffe83 Compare December 7, 2022 09:19
Copy link
Copy Markdown
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, minor comments

@mx123 mx123 force-pushed the kafka_protobuf_schema_registry_nested_objects branch 2 times, most recently from 5420cd6 to 8291c7d Compare December 7, 2022 16:59
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have them in protobuf directory so that all the protobuf files would be in one directory ?

Copy link
Copy Markdown
Contributor Author

@mx123 mx123 Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, then generator will create objects for all proto files, but existing protos are intended to be used with DynamicMessage and these objects will be unneded/unused. On attempt to move existing protos to new folder unreachable for generator it will be need to refactor many reference places in few libs since those proto file location depends ProtobufUtils class package (since it's getting them by getResource, see for details). So that's looks like out of scope of the current issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be solved if we parse the proto files and generate the DynamicMessage then we could have all the proto files in one directory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we cannot due to 0710dce#r1043458738

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to create the DynamicMessage via descriptor and use them to populate it ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protobuf static generator is used to workaround of the PR unrelated issue, when the parsing by ProtobufUtils of the unsupported_nested.proto throws:

com.google.protobuf.Descriptors$DescriptorValidationException: io.trino.protobuf.schema.nested_value_one: "NestedValue" is not an enum type.

	at com.google.protobuf.Descriptors$FieldDescriptor.crossLink(Descriptors.java:1587)
	at com.google.protobuf.Descriptors$FieldDescriptor.access$1400(Descriptors.java:1057)
	at com.google.protobuf.Descriptors$Descriptor.crossLink(Descriptors.java:1000)
	at com.google.protobuf.Descriptors$Descriptor.access$1200(Descriptors.java:648)
	at com.google.protobuf.Descriptors$FileDescriptor.crossLink(Descriptors.java:600)
	at com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:321)
	at com.google.protobuf.Descriptors$FileDescriptor.buildFrom(Descriptors.java:289)
	at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:128)
	at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:82)
	at io.trino.decoder.protobuf.ProtobufUtils.getFileDescriptor(ProtobufUtils.java:75)
....

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have disabled this parsing so as we don't run into a infinite loop. Is it possible for us to fix this issue - if we have a complicated schema like this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the issue of sophisticated schema parsing is need to be address to presence of output format data structure. As i see Trino SQL data types have not nothing similar to describe this object relation format. So, in common case now the issue cannot be resolved. But if we have particular known to us object structure we can describe it with some limitation to nested self referenced objects inclusions. It will be similar with google.protobuf.Timestamp object parsing implementation .

@mx123 mx123 force-pushed the kafka_protobuf_schema_registry_nested_objects branch 3 times, most recently from 91f48c2 to 6935b25 Compare December 9, 2022 10:48
dain and others added 13 commits December 11, 2022 17:32
When refresh token is retrieved for UI, currently we were sending
HTTP Status 303, assuming that all the request will just repeat the
call on the Location header. When this works for GET/PUT verbs, it does
not for non-idempotent ones like POST, as every js http client should
do a GET on LOCATION after 303 on POST. Due to that I change it to 307, that
should force every client to repeat exactly the same request,
no matter the verb.

Co-authored-by: s2lomon <s2lomon@gmail.com>
Actual work is done in `pageProjectWork.process()` call while
`projection.project` only performs setup of projection.
So both `expressionProfiler` and `metrics.recordProjectionTime`
needed to be around that method.
Removes outdated comments and unnecessary methods in local exchange
PartitioningExchanger since the operator is no longer implemented
in a way that attempts to be thread-safe.
Copy link
Copy Markdown
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just few comments.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are using this library? Why not simple HashSet? Is it about we are not removing items from it when we are closing the stack?

Is it possible to have a test for a case where using this library is needed (closing the recursion)? Is it possible (and reasonable) to not use this dependency?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see my explanation #15312 (comment)

Kafka protobuf registry schema cannot be translated to plain
Trino SQL structure in common cases when included data types use
referencing to same objects recursively.
Message sending to Kafka thows NPE if proto file has `import` derective.
Kafka protobuf schema registry parsing falls to dead loop if has included references to the same object.
It's need to abort the recursion loop with appropriated parsing error.
@mx123 mx123 force-pushed the kafka_protobuf_schema_registry_nested_objects branch from 6935b25 to 4cad364 Compare December 14, 2022 13:24
@kokosing
Copy link
Copy Markdown
Member

kokosing commented Jan 4, 2023

@mx123 Can you please rebase?

@mx123
Copy link
Copy Markdown
Contributor Author

mx123 commented Feb 9, 2023

closed due to PR#15724 opened

@mx123 mx123 closed this Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.