-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support newlines_in_values
CSV option
#11533
Merged
Merged
Commits on Jul 18, 2024
-
feat!: support
newlines_in_values
CSV optionThis significantly simplifies the UX when dealing with large CSV files that must support newlines in (quoted) values. By default, large CSV files will be repartitioned into multiple parallel range scans. This is great for performance in the common case but when large CSVs contain newlines in values the parallel scan will fail due to splitting on newlines within quotes rather than actual line terminators. With the current implementation, this behaviour can be controlled by the session-level `datafusion.optimizer.repartition_file_scans` and `datafusion.optimizer.repartition_file_min_size` settings. This commit introduces a `newlines_in_values` option to `CsvOptions` and plumbs it through to `CsvExec`, which includes it in the test for whether parallel execution is supported. This provides a convenient and searchable way to disable file scan repartitioning on a per-CSV basis. BREAKING CHANGE: This adds new public fields to types with all public fields, which is a breaking change.
Configuration menu - View commit details
-
Copy full SHA for 5321e25 - Browse repository at this point
Copy the full SHA 5321e25View commit details
Commits on Jul 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e05ca0e - Browse repository at this point
Copy the full SHA e05ca0eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ca9065 - Browse repository at this point
Copy the full SHA 9ca9065View commit details -
Configuration menu - View commit details
-
Copy full SHA for 34dcdb0 - Browse repository at this point
Copy the full SHA 34dcdb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c2d98d - Browse repository at this point
Copy the full SHA 8c2d98dView commit details -
Configuration menu - View commit details
-
Copy full SHA for ed0075d - Browse repository at this point
Copy the full SHA ed0075dView commit details
Commits on Jul 20, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 356f46b - Browse repository at this point
Copy the full SHA 356f46bView commit details -
fix: always checkout
*.slt
with LF line endingsThis is a bit of a stab in the dark, but it might fix multiline tests on Windows.
Configuration menu - View commit details
-
Copy full SHA for b9cc96b - Browse repository at this point
Copy the full SHA b9cc96bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4d06432 - Browse repository at this point
Copy the full SHA 4d06432View commit details -
fix: always checkout
newlines_in_values.csv
withLF
line endingsThe default git behaviour of converting line endings for checked out files causes the `csv_files.slt` test to fail when testing `newlines_in_values`. This appears to be due to the quoted newlines being converted to CRLF, which are not then normalised when the CSV is read. Assuming that the sqllogictests do normalise line endings in the expected output, this could then lead to a "spurious" diff from the actual output.
Configuration menu - View commit details
-
Copy full SHA for 35198b6 - Browse repository at this point
Copy the full SHA 35198b6View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.