-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42406][PROTOBUF] Fix recursive depth setting for Protobuf functions #40011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@HeartSaVioR , @HyukjinKwon , @SandishKumarHN PTAL. I would like to get this fix asap to make it into 3.4.x in OSS and 12.2 RC cut in DBR. |
| throw QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString()) | ||
| } else if (existingRecordNames.contains(recordName) && | ||
| recursiveDepth > recursiveFieldMaxDepth) { | ||
| recursiveDepth >= recursiveFieldMaxDepth) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main bug fix:
We allowed recursive depth go over by one. This makes the setting of '0' does not work as documented (earlier documentation stated even the first occurrence would be removed it it was set to 0).
Removing the first occurrence does not make sense since we don't know if this field is recursive until the second occurrence.
| } | ||
|
|
||
| test("Handle recursive fields in Protobuf schema, A->B->A") { | ||
| val schemaA = ProtobufUtils.buildDescriptor(testFileDesc, "recursiveA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra code to create actual records is not required since we are testing an AnalysisException. It does not need real data.
| val df = Seq(employee.toByteArray).toDF("protoEvent") | ||
| val options = new java.util.HashMap[String, String]() | ||
| options.put("recursive.fields.max.depth", "1") | ||
| options.put("recursive.fields.max.depth", "2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to increase the setting because this PR fixes off by one with enforcement.
|
cc: @gengliangwang |
SandishKumarHN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
gengliangwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one comment
|
Thanks, merging to master/3.4 |
…tions ### What changes were proposed in this pull request? This fixes how setting for limiting recursive depth is handled in Protobuf functions (`recursive.fields.max.depth`). Original documentation says as setting it to '0' removes the recursive field. But we never did that. We allow at least once. E.g. schema for recursive message 'EventPerson' does not change between the settings '0' or '1'. This fixes it by requiring the max depth to be at least 1. It also fixes how the recursion enfored. Updated the tests and added an extra test with new protobuf 'EventPersonWrapper'. I will annotate the diff inline pointing to main fixes. ### Why are the changes needed? This fixes a bug with enforcing `recursive.fields.max.depth` and clarifies more in the documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Unit tests Closes #40011 from rangadi/recursive-depth. Authored-by: Raghu Angadi <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 96516c8) Signed-off-by: Gengliang Wang <[email protected]>
…tions ### What changes were proposed in this pull request? This fixes how setting for limiting recursive depth is handled in Protobuf functions (`recursive.fields.max.depth`). Original documentation says as setting it to '0' removes the recursive field. But we never did that. We allow at least once. E.g. schema for recursive message 'EventPerson' does not change between the settings '0' or '1'. This fixes it by requiring the max depth to be at least 1. It also fixes how the recursion enfored. Updated the tests and added an extra test with new protobuf 'EventPersonWrapper'. I will annotate the diff inline pointing to main fixes. ### Why are the changes needed? This fixes a bug with enforcing `recursive.fields.max.depth` and clarifies more in the documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Unit tests Closes apache#40011 from rangadi/recursive-depth. Authored-by: Raghu Angadi <[email protected]> Signed-off-by: Gengliang Wang <[email protected]> (cherry picked from commit 96516c8) Signed-off-by: Gengliang Wang <[email protected]>
What changes were proposed in this pull request?
This fixes how setting for limiting recursive depth is handled in Protobuf functions (
recursive.fields.max.depth).Original documentation says as setting it to '0' removes the recursive field. But we never did that. We allow at least once. E.g. schema for recursive message 'EventPerson' does not change between the settings '0' or '1'.
This fixes it by requiring the max depth to be at least 1. It also fixes how the recursion enfored.
Updated the tests and added an extra test with new protobuf 'EventPersonWrapper'.
I will annotate the diff inline pointing to main fixes.
Why are the changes needed?
This fixes a bug with enforcing
recursive.fields.max.depthand clarifies more in the documentation.Does this PR introduce any user-facing change?
No.
How was this patch tested?