-
Notifications
You must be signed in to change notification settings - Fork 289
Document duplicate header behavior #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #309 +/- ##
=========================================
Coverage 97.87% 97.87%
Complexity 549 549
=========================================
Files 11 11
Lines 1178 1178
Branches 204 204
=========================================
Hits 1153 1153
Misses 13 13
Partials 12 12 ☔ View full report in Codecov by Sentry. |
kinow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, tricky. I think documenting the behavior sounds about right for now.
My go-to tool for parsing CSV ATM is Python. The stdlib csv module has the same behavior documented here, displaying the last occurrence.
>>> import csv
>>>
>>> with open('/tmp/test.csv') as f:
... reader = csv.reader(f, delimiter=',')
... for row in reader:
... print(', '.join(row))
...
A, A, B, B
1, 2, 5, 6
>>>
>>> with open('/tmp/test.csv') as f:
... reader = csv.DictReader(f)
... dict_from_csv = dict(list(reader)[0])
... print(', '.join(list(dict_from_csv.keys())))
...
A, B
>>>
>>> dict_from_csv
{'A': '2', 'B': '6'}But whenever I need some serious CSV processing, then I use Pandas. With Pandas it automatically deduplicates the column names. Maybe that's a feature that we could have in Commons CSV too?
>>> import pandas as pd
>>>
>>> df = pd.read_csv('/tmp/test.csv')
>>>
>>> ','.join(list(df.columns))
'A,A.1,B,B.1'
>>>
>>> df['A']
0 1
Name: A, dtype: int64
>>> df['A.1']
0 2
Name: A.1, dtype: int64
>>> Cheers,
-Bruno
|
I'm up for anything. I'll still give my thoughts on the methods above, though.
I think just doing the last occurrence is very likely to lead to scenarios where:
I think throwing an exception makes more sense, so developers can skip to step 3. 🤔
This crossed my mind, but I was hesitant to modify data passed to the lib, at least not implicitly. I'm also wary that this may screw up existing projects that depend on allowing/disallowing duplicates. i.e. want to allow duplicates and handle things through indexes / iteration, so this didn't cause a problem for them and want to preserve header names, and so don't need the headers deduplicated. If we want this in the library, maybe one of the following ideas would work, though?
|
|
Hi All, I'll start a thread on the ML for #309 (comment) and #309 (review) |
Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.10.0 to 1.11.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p> <blockquote> <p>Apache Commons CSV Version 1.11.0 Release Notes</p> <p>This document contains the release notes for the 1.11.0 version of Apache Commons CSV. Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> <p>Commons CSV requires at least Java 8.</p> <p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p> <p>Feature and bug fix release (Java 8 or above)</p> <p>Changes in this version include:</p> <h2>New Features</h2> <ul> <li>CSV-308: [Javadoc] Add example to CSVFormat#setHeaderComments() <a href="https://github.com/apache/commons-csv/issues/344">#344</a>. Thanks to Buddhi De Silva, Gary Gregory.</li> <li> <pre><code> Add and use CSVFormat#setTrailingData(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> <li> <pre><code> Add and use CSVFormat#setLenientEof(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> </ul> <h2>Fixed Bugs</h2> <ul> <li>CSV-306: Replace deprecated method in user guide, update external link <a href="https://github.com/apache/commons-csv/issues/324">#324</a>, <a href="https://github.com/apache/commons-csv/issues/325">#325</a>. Thanks to Sam Ng, Bruno P. Kinoshita.</li> <li> <pre><code> Document duplicate header behavior [#309](apache/commons-csv#309). Thanks to Seth Falco, Bruno P. Kinoshita. </code></pre> </li> <li> <pre><code> Add missing docs [#328](apache/commons-csv#328). Thanks to jkbkupczyk. </code></pre> </li> <li> <pre><code> [StepSecurity] CI: Harden GitHub Actions [#329](apache/commons-csv#329), [#330](apache/commons-csv#330). Thanks to step-security-bot. </code></pre> </li> <li>CSV-147: Better error message during faulty CSV record read <a href="https://github.com/apache/commons-csv/issues/347">#347</a>. Thanks to Steven Peterson, Benedikt Ritter, Gary Gregory, Joerg Schaible, Buddhi De Silva, Elliotte Rusty Harold.</li> <li>CSV-310: Misleading error message when QuoteMode set to None <a href="https://github.com/apache/commons-csv/issues/352">#352</a>. Thanks to Buddhi De Silva.</li> <li>CSV-311: OutOfMemory for very long rows despite using column value of type Reader. Thanks to Christian Feuersaenger, Gary Gregory.</li> <li> <pre><code> Use try-with-resources to manage JDBC Clob in CSVPrinter.printRecords(ResultSet). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> JDBC Blob columns are now output as Base64 instead of Object#toString(), which usually is InputStream#toString(). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Support unusual Excel use cases: Add support for trailing data after the closing quote, and EOF without a final closing quote [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> <li> <pre><code> MongoDB CSV empty first column parsing fix [#412](apache/commons-csv#412). Thanks to Igor Kamyshnikov, Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump commons-io:commons-io: from 2.11.0 to 2.16.1 [#408](apache/commons-csv#408), [#413](apache/commons-csv#413). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-parent from 57 to 69 [#410](apache/commons-csv#410). Thanks to Gary Gregory, Dependabot. </code></pre> </li> <li> <pre><code> Bump h2 from 2.1.214 to 2.2.224 [#333](apache/commons-csv#333), [#349](apache/commons-csv#349), [#359](apache/commons-csv#359). Thanks to Dependabot. </code></pre> </li> <li> <pre><code> Bump commons-lang3 from 3.12.0 to 3.14.0. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Update exception message in CSVRecord#getNextRecord() [#348](apache/commons-csv#348). Thanks to Buddhi De Silva, Michael Osipov, Gary Gregory. </code></pre> </li> <li> <pre><code> Bump tests using com.opencsv:opencsv from 5.8 to 5.9 [#373](apache/commons-csv#373). Thanks to Dependabot. </code></pre> </li> </ul> <p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p> <p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports,</p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-csv/commit/74e12741b24e724bb2e60109daa0c834fd75a68a"><code>74e1274</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/89cbc7bb3f7f840045ee1fa17863830110e8aebe"><code>89cbc7b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/447682ec4a4bba7ea3c4edf89a87c63ff5bf718e"><code>447682e</code></a> Match version to POM</li> <li><a href="https://github.com/apache/commons-csv/commit/4c186f27f7b340aa7d78dc68d380200bcb49bb46"><code>4c186f2</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/420">#420</a> from apache/dependabot/github_actions/actions/checkou...</li> <li><a href="https://github.com/apache/commons-csv/commit/8af37f7992e3e7fb37e0f2a5a9b02f27b9cb5e84"><code>8af37f7</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/418">#418</a> from apache/dependabot/github_actions/github/codeql-a...</li> <li><a href="https://github.com/apache/commons-csv/commit/2238314ef83214142a4b6304c3cc36a20749b953"><code>2238314</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/419">#419</a> from apache/dependabot/github_actions/actions/upload-...</li> <li><a href="https://github.com/apache/commons-csv/commit/2ccf6686364c9183a03ab52c944f63695abc2843"><code>2ccf668</code></a> Bump actions/checkout from 4.1.2 to 4.1.4</li> <li><a href="https://github.com/apache/commons-csv/commit/26cf90ecbffaf0243dd01cdf941d0c13fb875a88"><code>26cf90e</code></a> Bump actions/upload-artifact from 4.3.2 to 4.3.3</li> <li><a href="https://github.com/apache/commons-csv/commit/586310afbc7f93c356ede7602706b3a2a5a6b916"><code>586310a</code></a> Bump github/codeql-action from 3.25.1 to 3.25.3</li> <li><a href="https://github.com/apache/commons-csv/commit/bea505a55b6ab3c4eca27f395b9d6fa6787d496a"><code>bea505a</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/416">#416</a> from apache/dependabot/github_actions/actions/upload-...</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.10.0...rel/commons-csv-1.11.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Closes #1923 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.11.0. Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: William Hyun <[email protected]>
Bumps [org.apache.commons:commons-csv](https://github.com/apache/commons-csv) from 1.10.0 to 1.11.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/apache/commons-csv/blob/master/RELEASE-NOTES.txt">org.apache.commons:commons-csv's changelog</a>.</em></p> <blockquote> <p>Apache Commons CSV Version 1.11.0 Release Notes</p> <p>This document contains the release notes for the 1.11.0 version of Apache Commons CSV. Commons CSV reads and writes files in variations of the Comma Separated Value (CSV) format.</p> <p>Commons CSV requires at least Java 8.</p> <p>The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.</p> <p>Feature and bug fix release (Java 8 or above)</p> <p>Changes in this version include:</p> <h2>New Features</h2> <ul> <li>CSV-308: [Javadoc] Add example to CSVFormat#setHeaderComments() <a href="https://github.com/apache/commons-csv/issues/344">#344</a>. Thanks to Buddhi De Silva, Gary Gregory.</li> <li> <pre><code> Add and use CSVFormat#setTrailingData(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> <li> <pre><code> Add and use CSVFormat#setLenientEof(boolean) in CSVFormat.EXCEL for Excel compatibility [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> </ul> <h2>Fixed Bugs</h2> <ul> <li>CSV-306: Replace deprecated method in user guide, update external link <a href="https://github.com/apache/commons-csv/issues/324">#324</a>, <a href="https://github.com/apache/commons-csv/issues/325">#325</a>. Thanks to Sam Ng, Bruno P. Kinoshita.</li> <li> <pre><code> Document duplicate header behavior [#309](apache/commons-csv#309). Thanks to Seth Falco, Bruno P. Kinoshita. </code></pre> </li> <li> <pre><code> Add missing docs [#328](apache/commons-csv#328). Thanks to jkbkupczyk. </code></pre> </li> <li> <pre><code> [StepSecurity] CI: Harden GitHub Actions [#329](apache/commons-csv#329), [#330](apache/commons-csv#330). Thanks to step-security-bot. </code></pre> </li> <li>CSV-147: Better error message during faulty CSV record read <a href="https://github.com/apache/commons-csv/issues/347">#347</a>. Thanks to Steven Peterson, Benedikt Ritter, Gary Gregory, Joerg Schaible, Buddhi De Silva, Elliotte Rusty Harold.</li> <li>CSV-310: Misleading error message when QuoteMode set to None <a href="https://github.com/apache/commons-csv/issues/352">#352</a>. Thanks to Buddhi De Silva.</li> <li>CSV-311: OutOfMemory for very long rows despite using column value of type Reader. Thanks to Christian Feuersaenger, Gary Gregory.</li> <li> <pre><code> Use try-with-resources to manage JDBC Clob in CSVPrinter.printRecords(ResultSet). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> JDBC Blob columns are now output as Base64 instead of Object#toString(), which usually is InputStream#toString(). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Support unusual Excel use cases: Add support for trailing data after the closing quote, and EOF without a final closing quote [#303](apache/commons-csv#303). Thanks to DamjanJovanovic, Gary Gregory. </code></pre> </li> <li> <pre><code> MongoDB CSV empty first column parsing fix [#412](apache/commons-csv#412). Thanks to Igor Kamyshnikov, Gary Gregory. </code></pre> </li> </ul> <h2>Changes</h2> <ul> <li> <pre><code> Bump commons-io:commons-io: from 2.11.0 to 2.16.1 [#408](apache/commons-csv#408), [#413](apache/commons-csv#413). Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Bump commons-parent from 57 to 69 [#410](apache/commons-csv#410). Thanks to Gary Gregory, Dependabot. </code></pre> </li> <li> <pre><code> Bump h2 from 2.1.214 to 2.2.224 [#333](apache/commons-csv#333), [#349](apache/commons-csv#349), [#359](apache/commons-csv#359). Thanks to Dependabot. </code></pre> </li> <li> <pre><code> Bump commons-lang3 from 3.12.0 to 3.14.0. Thanks to Gary Gregory. </code></pre> </li> <li> <pre><code> Update exception message in CSVRecord#getNextRecord() [#348](apache/commons-csv#348). Thanks to Buddhi De Silva, Michael Osipov, Gary Gregory. </code></pre> </li> <li> <pre><code> Bump tests using com.opencsv:opencsv from 5.8 to 5.9 [#373](apache/commons-csv#373). Thanks to Dependabot. </code></pre> </li> </ul> <p>Historical list of changes: <a href="https://commons.apache.org/proper/commons-csv/changes-report.html">https://commons.apache.org/proper/commons-csv/changes-report.html</a></p> <p>For complete information on Apache Commons CSV, including instructions on how to submit bug reports,</p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/apache/commons-csv/commit/74e12741b24e724bb2e60109daa0c834fd75a68a"><code>74e1274</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/89cbc7bb3f7f840045ee1fa17863830110e8aebe"><code>89cbc7b</code></a> Prepare for the next release candidate</li> <li><a href="https://github.com/apache/commons-csv/commit/447682ec4a4bba7ea3c4edf89a87c63ff5bf718e"><code>447682e</code></a> Match version to POM</li> <li><a href="https://github.com/apache/commons-csv/commit/4c186f27f7b340aa7d78dc68d380200bcb49bb46"><code>4c186f2</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/420">#420</a> from apache/dependabot/github_actions/actions/checkou...</li> <li><a href="https://github.com/apache/commons-csv/commit/8af37f7992e3e7fb37e0f2a5a9b02f27b9cb5e84"><code>8af37f7</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/418">#418</a> from apache/dependabot/github_actions/github/codeql-a...</li> <li><a href="https://github.com/apache/commons-csv/commit/2238314ef83214142a4b6304c3cc36a20749b953"><code>2238314</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/419">#419</a> from apache/dependabot/github_actions/actions/upload-...</li> <li><a href="https://github.com/apache/commons-csv/commit/2ccf6686364c9183a03ab52c944f63695abc2843"><code>2ccf668</code></a> Bump actions/checkout from 4.1.2 to 4.1.4</li> <li><a href="https://github.com/apache/commons-csv/commit/26cf90ecbffaf0243dd01cdf941d0c13fb875a88"><code>26cf90e</code></a> Bump actions/upload-artifact from 4.3.2 to 4.3.3</li> <li><a href="https://github.com/apache/commons-csv/commit/586310afbc7f93c356ede7602706b3a2a5a6b916"><code>586310a</code></a> Bump github/codeql-action from 3.25.1 to 3.25.3</li> <li><a href="https://github.com/apache/commons-csv/commit/bea505a55b6ab3c4eca27f395b9d6fa6787d496a"><code>bea505a</code></a> Merge pull request <a href="https://github.com/apache/commons-csv/issues/416">#416</a> from apache/dependabot/github_actions/actions/upload-...</li> <li>Additional commits viewable in <a href="https://github.com/apache/commons-csv/compare/rel/commons-csv-1.10.0...rel/commons-csv-1.11.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Closes #1923 from dependabot[bot]/dependabot/maven/java/org.apache.commons-commons-csv-1.11.0. Authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: William Hyun <[email protected]>
A while back we discussed duplicate header behavior, and in the comments thought we should define what happens when one tries to get from a
CSVRecord/Map<String, String>when this occurs.Relevant comments:
I reviewed the code and manually tested to discover the current behavior:
CSVRecord#getandCSVRecord#toMapto state how this is handled.It may be that instead of this, we may want to instead define a different behavior before we add documentation and tests to cover this case.
I'm honestly not sure what the expected behavior would be for this. I'm tempted to say that it should throw an exception in this instance. 🤔
I'm not aware of the use-case for having duplicate headings in the first place, but I assume there is a reason for it. But I definitely don't see why one would use methods like
#getor#toMapin cases that they know they have duplicate headers.Maybe if the first and last index of the given header name differs, throw an exception?