Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33940][BUILD] Upgrade univocity to 2.9.1 #31246

Closed
wants to merge 2 commits into from

Conversation

CodingCat
Copy link
Contributor

What changes were proposed in this pull request?

upgrade univocity

Why are the changes needed?

csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0,

when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209)

in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default

more details at #30972 and uniVocity/univocity-parsers#438

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing UT

@github-actions github-actions bot added the BUILD label Jan 19, 2021
@CodingCat
Copy link
Contributor Author

@HyukjinKwon filed a new PR to upgrade univocity

@CodingCat
Copy link
Contributor Author

@HyukjinKwon would this be considered for 3.1.1 release?

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38819/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38819/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Test build #134234 has finished for PR 31246 at commit 782cfd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Yes, I think we can since it contains a legitimate bug fix that affects Spark's workload.

@HyukjinKwon HyukjinKwon changed the title [SPARK-33940][SQL] upgrade univocity [SPARK-33940][SQL] Upgrade univocity to 2.9.1 Jan 20, 2021
@HyukjinKwon
Copy link
Member

Can you run ./dev/test-dependencies.sh script and update dependent jars listed in https://github.com/apache/spark/tree/master/dev/deps?

@maropu
Copy link
Member

maropu commented Jan 20, 2021

I think you need to run ./dev/test-dependencies.sh --replace-manifest to update the dependency files.

@maropu
Copy link
Member

maropu commented Jan 20, 2021

(my comment was a bit late..)

@maropu maropu changed the title [SPARK-33940][SQL] Upgrade univocity to 2.9.1 [SPARK-33940][BUILD] Upgrade univocity to 2.9.1 Jan 20, 2021
@CodingCat
Copy link
Contributor Author

thanks! @HyukjinKwon @maropu , just updated the deps

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, pending tests.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38828/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38828/

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.1.

HyukjinKwon pushed a commit that referenced this pull request Jan 20, 2021
### What changes were proposed in this pull request?

upgrade univocity

### Why are the changes needed?

csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0,

when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209)

in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default

more details at #30972 and uniVocity/univocity-parsers#438

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
existing UT

Closes #31246 from CodingCat/upgrade_univocity.

Authored-by: CodingCat <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 7f3e952)
Signed-off-by: HyukjinKwon <[email protected]>
@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134243 has finished for PR 31246 at commit ddce09c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@CodingCat CodingCat deleted the upgrade_univocity branch January 22, 2021 19:49
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
### What changes were proposed in this pull request?

upgrade univocity

### Why are the changes needed?

csv writer actually has an implicit limit on column name length due to univocity-parser 2.9.0,

when we initialize a writer https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/AbstractWriter.java#L211, it calls toIdentifierGroupArray which calls valueOf in NormalizedString.java eventually (https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/NormalizedString.java#L205-L209)

in that stringCache.get, it has a maxStringLength cap https://github.com/uniVocity/univocity-parsers/blob/e09114c6879fa6c2c15e7365abc02cda3e193ff7/src/main/java/com/univocity/parsers/common/StringCache.java#L104 which is 1024 by default

more details at apache#30972 and uniVocity/univocity-parsers#438

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
existing UT

Closes apache#31246 from CodingCat/upgrade_univocity.

Authored-by: CodingCat <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants