Added support duplicate header entries #34

oxaoo · 2018-10-27T13:14:53Z

The support of duplicate header entries allows processing a CSV file do not worry about the presence of duplicate headers.
It is enough to just call CSVFormat.DEFAULT.withIgnoreDuplicateHeaderEntries() that has to be first in the forming chain of the CSVFormat.

What is the need for this?!
Here are two examples from real life.

There is a well-known set of columns from which to extract data. And there is no information about the potential presence of other columns (possibly duplicates) and their sequences in a document.
The use of this feature will avoid such exceptions as java.lang.IllegalArgumentException: The header contains a duplicate name when the contents of the document are not fully known and there is a need to get by name.
Example:
Well-known columns set: [A, B, D].
Actual document columns set: [Z, A, B, C, D, C]
Updated header structure: Z->[0], A->[1], B->[2], C->[3, 5], D->[4]
Summarizing: This approach avoids exceptions for columns that do not even participate in processing. At the same time allows saving the possibility of getting by name.
There is a pivot table that aggregates other tables with the partially identical column names and there is a need to perform an aggregate function with the same columns.
Example:
Table1: [A, B, C]
Table2: [B, C]
Pivot table: [A, B, C, B, C]
Task: need to perform an XOR for duplicate columns
Updated header structure: A->[0], B->[1, 3], C->[2, 4].
Summarizing: This approach allows storing duplicates as an ordered set. Thus it will allow to perform xor(B[1], B[3]) & xor (C[2], C[4]).

coveralls · 2018-10-27T13:19:36Z

Coverage increased (+0.2%) to 95.294% when pulling 284035a on oxaoo:master into 0ab2b08 on apache:master.

oxaoo · 2018-12-08T18:13:21Z

@garydgregory, why don't you take a look at this pull request?

garydgregory · 2018-12-09T00:44:58Z

Instead of this complication, I would recommend using withSkipHeaderRecord and withHeader(String...). This let's you skip the first row where your duplicate headers are a problem, and set the headers to values that make sense to you application.

For an example, see org.apache.commons.csv.CSVParserTest.testSkipHeaderOverrideDuplicateHeaders().

garydgregory · 2019-06-01T19:18:41Z

Please see the new methods in git master and 1.7-SNAPSHOT builds:

org.apache.commons.csv.CSVFormat.withAllowDuplicateHeaderNames()
org.apache.commons.csv.CSVFormat.withAllowDuplicateHeaderNames(boolean)

Does that work for your use case?

LuckyIlam · 2019-06-05T14:18:43Z

Hi,
Pull request seems to be incomplete because CSVFormat.validate:1671 doesnt not allow duplicate header name ?

garydgregory · 2019-06-05T14:28:08Z

Please provide a separate PR if you find a problem as we just released 1.7 but the site has not been updated yet.

LuckyIlam · 2019-06-05T14:36:37Z

Please provide a separate PR if you find a problem as we just released 1.7 but the site has not been updated yet.

Could you just take a minute to look that code, is it normal that there is
// validate header if (header != null) { final Set<String> dupCheck = new HashSet<>(); for (final String hdr : header) { if (!dupCheck.add(hdr)) { throw new IllegalArgumentException( "The header contains a duplicate entry: '" + hdr + "' in " + Arrays.toString(header)); } } }
and not
// validate header if (header != null && !allowDuplicateHeaderNames) { final Set<String> dupCheck = new HashSet<>(); for (final String hdr : header) { if (!dupCheck.add(hdr)) { throw new IllegalArgumentException( "The header contains a duplicate entry: '" + hdr + "' in " + Arrays.toString(header)); } } }

garydgregory · 2019-06-05T17:30:52Z

It would be best if you could provide a PR with a unit test that shows what is wrong...

LuckyIlam · 2019-06-05T19:58:06Z

It would be best if you could provide a PR with a unit test that shows what is wrong...

I can't push branch on this repository, i missing some auth ...

garydgregory · 2019-06-05T20:07:44Z

That's not how it works. Please read https://help.github.com/en/articles/about-pull-requests

LuckyIlam · 2019-06-05T20:28:00Z

That's not how it works. Please read https://help.github.com/en/articles/about-pull-requests

Are you in "In the fork and pull model," ?

anyone can fork an existing repository and push changes to their personal fork without needing access to the source repository.

LuckyIlam · 2019-06-05T20:45:58Z

well sorry for disturbance and thanks for help, first time i use Github, i finaly open PR

garydgregory · 2022-02-19T17:41:02Z

Closing, implemented by another PR, see https://issues.apache.org/jira/browse/CSV-264 and #114

oxaoo force-pushed the master branch from c499aee to 8637a9c Compare November 1, 2018 14:37

Added support duplicate header entries

284035a

oxaoo force-pushed the master branch from 8637a9c to 284035a Compare November 1, 2018 14:43

garydgregory closed this Feb 19, 2022

Added support duplicate header entries #34

Added support duplicate header entries #34

Uh oh!

Conversation

oxaoo commented Oct 27, 2018

Uh oh!

coveralls commented Oct 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oxaoo commented Dec 8, 2018

Uh oh!

garydgregory commented Dec 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garydgregory commented Jun 1, 2019

Uh oh!

LuckyIlam commented Jun 5, 2019

Uh oh!

garydgregory commented Jun 5, 2019

Uh oh!

LuckyIlam commented Jun 5, 2019

Uh oh!

garydgregory commented Jun 5, 2019

Uh oh!

LuckyIlam commented Jun 5, 2019

Uh oh!

garydgregory commented Jun 5, 2019

Uh oh!

LuckyIlam commented Jun 5, 2019

Uh oh!

LuckyIlam commented Jun 5, 2019

Uh oh!

garydgregory commented Feb 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Oct 27, 2018 •

edited

Loading

garydgregory commented Dec 9, 2018 •

edited

Loading