-
Notifications
You must be signed in to change notification settings - Fork 289
Added support duplicate header entries #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@garydgregory, why don't you take a look at this pull request? |
|
Instead of this complication, I would recommend using For an example, see |
|
Please see the new methods in git master and 1.7-SNAPSHOT builds:
Does that work for your use case? |
|
Hi, |
|
Please provide a separate PR if you find a problem as we just released 1.7 but the site has not been updated yet. |
Could you just take a minute to look that code, is it normal that there is |
|
It would be best if you could provide a PR with a unit test that shows what is wrong... |
I can't push branch on this repository, i missing some auth ... |
|
That's not how it works. Please read https://help.github.com/en/articles/about-pull-requests |
Are you in "In the fork and pull model," ? anyone can fork an existing repository and push changes to their personal fork without needing access to the source repository. |
|
well sorry for disturbance and thanks for help, first time i use Github, i finaly open PR |
|
Closing, implemented by another PR, see https://issues.apache.org/jira/browse/CSV-264 and #114 |
The support of duplicate header entries allows processing a CSV file do not worry about the presence of duplicate headers.
It is enough to just call
CSVFormat.DEFAULT.withIgnoreDuplicateHeaderEntries()that has to be first in the forming chain of theCSVFormat.What is the need for this?!
Here are two examples from real life.
There is a well-known set of columns from which to extract data. And there is no information about the potential presence of other columns (possibly duplicates) and their sequences in a document.
The use of this feature will avoid such exceptions as
java.lang.IllegalArgumentException: The header contains a duplicate namewhen the contents of the document are not fully known and there is a need to get by name.Example:
Well-known columns set: [A, B, D].
Actual document columns set: [Z, A, B, C, D, C]
Updated header structure: Z->[0], A->[1], B->[2], C->[3, 5], D->[4]
Summarizing: This approach avoids exceptions for columns that do not even participate in processing. At the same time allows saving the possibility of getting by name.
There is a pivot table that aggregates other tables with the partially identical column names and there is a need to perform an aggregate function with the same columns.
Example:
Table1: [A, B, C]
Table2: [B, C]
Pivot table: [A, B, C, B, C]
Task: need to perform an XOR for duplicate columns
Updated header structure: A->[0], B->[1, 3], C->[2, 4].
Summarizing: This approach allows storing duplicates as an ordered set. Thus it will allow to perform xor(B[1], B[3]) & xor (C[2], C[4]).