Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvsort seems to hang on large files/sort jobs #157

Closed
eads opened this issue Mar 26, 2012 · 4 comments
Closed

csvsort seems to hang on large files/sort jobs #157

eads opened this issue Mar 26, 2012 · 4 comments

Comments

@eads
Copy link

eads commented Mar 26, 2012

I have a rather large (42m) CSV. I'm sitting here looking crazy waiting while csvsort has been running for over 10 minutes trying to sort the file. To perform the same sort in Excel or LibreOffice is slow, but still less than a minute.

@onyxfish
Copy link
Collaborator

Yep. It's the type inference--it has to buffer the whole file into memory, iterate over it to figure out what the type is, iterate over it again to coerce to the correct type and then finally sort all that data. It's crazy slow. Not sure what to do about it.

@eads
Copy link
Author

eads commented Mar 26, 2012

Bummer. I wonder how CSVFix handles it, and how the performance is.

@gpoulter
Copy link

gpoulter commented Jul 2, 2012

How about an option to skip type coercion? This would be similar to specifying that all columns are "text" when opening in LibreOffice.

@onyxfish
Copy link
Collaborator

You can now pass --no-inference to disable type inference on csvsql. This option still needs to be extended to csvsort, but we've got a ticket for it in #222, so I'm closing this one as a dupe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants