-
-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implicit limitation on max column name length? #438
Comments
cc @jbax |
ping @jbax |
Fixed and will release a 2.9.1-SNAPSHOT version soon which you can use to test and confirm it's working. |
Awesome! |
HyukjinKwon
pushed a commit
to apache/spark
that referenced
this issue
Jan 20, 2021
### What changes were proposed in this pull request? upgrade univocity ### Why are the changes needed? csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0, when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209) in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default more details at #30972 and uniVocity/univocity-parsers#438 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes #31246 from CodingCat/upgrade_univocity. Authored-by: CodingCat <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
HyukjinKwon
pushed a commit
to apache/spark
that referenced
this issue
Jan 20, 2021
### What changes were proposed in this pull request? upgrade univocity ### Why are the changes needed? csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0, when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209) in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default more details at #30972 and uniVocity/univocity-parsers#438 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes #31246 from CodingCat/upgrade_univocity. Authored-by: CodingCat <[email protected]> Signed-off-by: HyukjinKwon <[email protected]> (cherry picked from commit 7f3e952) Signed-off-by: HyukjinKwon <[email protected]>
skestle
pushed a commit
to skestle/spark
that referenced
this issue
Feb 3, 2021
### What changes were proposed in this pull request? upgrade univocity ### Why are the changes needed? csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0, when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209) in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default more details at apache#30972 and uniVocity/univocity-parsers#438 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes apache#31246 from CodingCat/upgrade_univocity. Authored-by: CodingCat <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, we recently noticed a bug in Spark 3.x which depends on the latest version of univocity parser. Basically, we found that there is an implicit limitation on column name length in univocity(1024 chars by default). if you added a header longer than the limitation, you will get NPE (you could see the detailed analysis in the Spark PR)
in univocity code base, you could add the following unit test to reproduce (to get that NPE error mentioned in Spark PR)
NPE:
our question is: is such a limitation intentionally added? or it is actually a bug?
cc @HyukjinKwon @viirya
The text was updated successfully, but these errors were encountered: