-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33940][BUILD] Upgrade univocity to 2.9.1 #31246
Conversation
@HyukjinKwon filed a new PR to upgrade univocity |
@HyukjinKwon would this be considered for 3.1.1 release? |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134234 has finished for PR 31246 at commit
|
Yes, I think we can since it contains a legitimate bug fix that affects Spark's workload. |
Can you run |
I think you need to run |
(my comment was a bit late..) |
thanks! @HyukjinKwon @maropu , just updated the deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, pending tests.
Kubernetes integration test starting |
Kubernetes integration test status success |
Merged to master and branch-3.1. |
### What changes were proposed in this pull request? upgrade univocity ### Why are the changes needed? csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0, when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209) in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default more details at #30972 and uniVocity/univocity-parsers#438 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes #31246 from CodingCat/upgrade_univocity. Authored-by: CodingCat <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 7f3e952) Signed-off-by: HyukjinKwon <[email protected]>
Test build #134243 has finished for PR 31246 at commit
|
### What changes were proposed in this pull request? upgrade univocity ### Why are the changes needed? csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0, when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209) in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default more details at apache#30972 and uniVocity/univocity-parsers#438 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes apache#31246 from CodingCat/upgrade_univocity. Authored-by: CodingCat <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
upgrade univocity
Why are the changes needed?
csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0,
when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209)
in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default
more details at #30972 and uniVocity/univocity-parsers#438
Does this PR introduce any user-facing change?
No
How was this patch tested?
existing UT