-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling/forcing type inference for certain columns only #151
Comments
A CSVT is might be a little exotic, but might be most robust solution if you do the same tasks over and over. The example I had so far was that csvsql made something a date field that was a varchar, and I couldn't really get it to do what I wanted. Could you just specify which fields to not guess, defaulting to varchar? CSVT would also make you specify all columns, right? That would be daunting on a big dataset, probably defeat the purpose. I think I like the "Don't guess on this column" option most. |
The latter is certainly a possibility although I'm inclined to implement a more general solution if one exists. I like .csvt because 1) it's still CSV and 2) it's an existing (albeit, as you say, exotic) convention. The somewhat annoying thing about it is that I'll be mandating a pretty specific list of supported Python types, which aren't going to match any other type system out there in the world. Internally csvkit normalizes to: NoneType, bool, int, float, datetime.datetime, datetime.time, datetime.date and unicode |
Am I right that you'd have to specify all columns if you went the .csvt route? |
That's true, that is def. a downside. Maybe a
syntax is better after all. It's also worth keeping in mind that for type coercion things can really only be cast "down", i.e. int -> unicode. If you were to try to use a csvt to specify a more granular type the thing would just blow up anyway. |
Yeah, I think that's OK -- it's more important to me that something fails over to generic rather than specific. So if I have to CAST (blah) AS INTEGER, that's no big deal. Supporting .CSVT might be a nice feature as well, but would not really solve my particular problem, which I think I'd come come across more often if my main use is to quickly start playing with some data. In any case, csvsql is really cool. Navicat is obviously good at CSV imports, but still requires some configuration guesswork. It's a huge timesaver and probably nearly eliminates the need for certain types of users in our organization to even use Navicat, which would definitely save us some money. |
That's wonderful to hear. I'll look at hacking in a way of force values to strings sometime soon (possible tonight, though I'm down other rabbit holes at the moment). Thanks for the feedback! |
Great, thanks! No rush from me, just wanted to say something while I was thinking of it. |
Noting that there's some discussion of possible solutions in the referenced issues above. |
So I think the simplest satisfactory solution for the reported feature request is to allow --no-inference to accept column names, e.g.:
|
is there a realistic plan to do this? |
There is no time planned to work on this issue. It remains open. |
A mapping?
--types int,varchar
A .csvt?
The text was updated successfully, but these errors were encountered: