-
Notifications
You must be signed in to change notification settings - Fork 289
CSV-264: Added DuplicateHeaderMode for flexibility with header strictness. #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the PR. You have removed some public API methods. This is not allowed and would fail binary compatibility tests. I have enabled these on the CI environments. If you rebase on master the errors should appear in the checks. You can run: on the project to see the full checks on the code. The |
|
Sorry about that, I rebased master and applied the changes requested. I usually use |
|
You can force push to your branch all you want; -) we usually do not rewrite history for long lived branches like master and release. |
|
Not sure if you guys are interested in merging this PR, but figured I'd fix the merge conflicts that appeared from recent commits. |
|
@SethiPandi |
3268d4e to
a22ace9
Compare
|
Just a heads up that this is rebased with |
|
@SethFalco I feel there is something important missing here:
For example, if duplicate headers are allowed, does the first column win? Should it? Does the last column win? Certainly, it should not be random or undocumented. Also, would you mind rebasing on master please? I've recently deprecated the |
71801b9 to
ef1529a
Compare
|
It's worth noting this change doesn't impact how data is resolved internally or how headers are referenced, but just how if duplicate empty headers are allowed. I can peek into the current behavior anyway and try to document it if it's not already. Meanwhile, I've rebased with master. I also updated an exception to no longer recommend using one of the deprecated methods. Edit: Sorry, forgot to add @ Deprecated to the getter. |
d74d1cc to
1be64a6
Compare
|
I didn't see it documented, figured the easiest way to find out would be to just test it.
List<CSVRecord> records = parser.getRecords();
for (CSVRecord record : records)
System.out.println(record.get("A"));
// 2
// 4
List<CSVRecord> records = parser.getRecords();
for (CSVRecord record : records) {
Map<String, String> map = record.toMap();
System.out.println(map);
}
// {A=2, B=6}
// {A=4, B=8}I think we could add a docstring on It might be a good idea to add test cases to cover this scenario, if it'll become documented behavior. Mind if I do this in a separate PR? |
garydgregory
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SethFalco
Thank you for the update. Please see my comments scattered throughout.
Gary
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
src/test/java/org/apache/commons/csv/issues/JiraCsv264Test.java
Outdated
Show resolved
Hide resolved
b6b801b to
adf02b2
Compare
|
Sorry about the imports, I'm guessing that due to a setting in IntelliJ setting back when I was using that. |
|
@SethFalco |
|
Just rebased this to resolve merge conflicts.
Seems no one wants to comment on this. ^-^' Any chance this could be merged soon, or still waiting on feedback? (Note, I'm in no rush.) |
kinow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like a good fix for the issue described in JIRA. Only adds one more setting, giving more flexibility for users.
+1 and thanks for the contribution and patience @SethFalco !
-Bruno
Instead of only having a boolean
true/falsefor how duplicate header values are handled, this uses an enum instead.Previously the only possibilities were:
true: To allow duplicates.false: To disallow duplicates, except empty cells which were allowed to be duplicates.This pull request makes an enum with three options:
ALLOW_ALL: To always allow duplicates. (Same astruepreviously.)ALLOW_EMPTY: To allow duplicates only if they're empty cells. (Same asfalsepreviously.)DISALLOW: To disallow all duplicates.This provides a little more flexibility in the strictness for the parser, and makes what the options do clearer.
Jira Issue: https://issues.apache.org/jira/browse/CSV-264