Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfinished quotes (also in skipped lines) result in an empty CSV with read_csv() #12440

Closed
2 tasks done
Rmulet opened this issue Nov 14, 2023 · 2 comments · Fixed by #20306
Closed
2 tasks done

Unfinished quotes (also in skipped lines) result in an empty CSV with read_csv() #12440

Rmulet opened this issue Nov 14, 2023 · 2 comments · Fixed by #20306
Assignees
Labels
A-io-csv Area: reading/writing CSV files accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@Rmulet
Copy link

Rmulet commented Nov 14, 2023

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
pl.read_csv(b'#"Comment\nColA\tColB\n1\t2',separator='\t',comment_char='#')

Log output

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/research/modules/installs/python/python-3.10.2/lib/python3.10/site-packages/polars/io/csv/functions.py", line 366, in read_csv
    df = pl.DataFrame._read_csv(
  File "/research/modules/installs/python/python-3.10.2/lib/python3.10/site-packages/polars/dataframe/frame.py", line 775, in _read_csv
    self._df = PyDataFrame.read_csv(
polars.exceptions.NoDataError: empty CSV

Issue description

Whenever there are unfinished quotes in a file, polars.read_csv() raises an exception and reports that the CSV is empty. This error is not very informative, since the file is not, actually, empty.

This can be particularly misleading when the unfinished quotes are in a commented/skipped line (see example above), since one would expect those to be completely ignored.

I managed to find a workaround by setting quote_char to another character that was not used anywhere else (%), but that's probably not the optimal way to deal with this. Setting quote_char to None did not work.

Expected behavior

The expected behaviour would be that:

a) Unfinished quotes are completely ignored in commented or skipped rows
b) The error message triggered by unfinished quotes elsewhere is informative

Installed versions

--------Version info---------
Polars:              0.19.13
Index type:          UInt32
Platform:            Linux-5.4.0-163-generic-x86_64-with-glibc2.31
Python:              3.10.2 (main, Feb 23 2022, 22:06:35) [GCC 9.3.0]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         2.2.0
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.10.0
gevent:              <not installed>
matplotlib:          1.5.3
numpy:               1.23.1
openpyxl:            <not installed>
pandas:              2.1.3
pyarrow:             14.0.1
pydantic:            <not installed>
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          1.4.32
xlsx2csv:            <not installed>
xlsxwriter:          3.0.3```

</details>
@Rmulet Rmulet added bug Something isn't working python Related to Python Polars labels Nov 14, 2023
@ritchie46
Copy link
Member

It should provide a better error. We should not read the file as this is an ill defined csv file.

@stinodego stinodego added the needs triage Awaiting prioritization by a maintainer label Jan 13, 2024
@stinodego stinodego added the A-io-csv Area: reading/writing CSV files label Jan 21, 2024
@ritchie46
Copy link
Member

Ah, I now see the quote is in the comment line. Comment line should not escape fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-csv Area: reading/writing CSV files accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants