Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling/forcing type inference for certain columns only #151

Open
onyxfish opened this issue Mar 1, 2012 · 11 comments
Open

Allow disabling/forcing type inference for certain columns only #151

onyxfish opened this issue Mar 1, 2012 · 11 comments

Comments

@onyxfish
Copy link
Collaborator

onyxfish commented Mar 1, 2012

A mapping?

--types int,varchar

A .csvt?

@mikejcorey
Copy link

A CSVT is might be a little exotic, but might be most robust solution if you do the same tasks over and over. The example I had so far was that csvsql made something a date field that was a varchar, and I couldn't really get it to do what I wanted.

Could you just specify which fields to not guess, defaulting to varchar?

CSVT would also make you specify all columns, right? That would be daunting on a big dataset, probably defeat the purpose.

I think I like the "Don't guess on this column" option most.

@onyxfish
Copy link
Collaborator Author

onyxfish commented Mar 1, 2012

The latter is certainly a possibility although I'm inclined to implement a more general solution if one exists. I like .csvt because 1) it's still CSV and 2) it's an existing (albeit, as you say, exotic) convention. The somewhat annoying thing about it is that I'll be mandating a pretty specific list of supported Python types, which aren't going to match any other type system out there in the world.

Internally csvkit normalizes to:

NoneType, bool, int, float, datetime.datetime, datetime.time, datetime.date and unicode

@mikejcorey
Copy link

Am I right that you'd have to specify all columns if you went the .csvt route?

@onyxfish
Copy link
Collaborator Author

onyxfish commented Mar 1, 2012

That's true, that is def. a downside. Maybe a

--no-infer a,b,c

syntax is better after all.

It's also worth keeping in mind that for type coercion things can really only be cast "down", i.e. int -> unicode. If you were to try to use a csvt to specify a more granular type the thing would just blow up anyway.

@mikejcorey
Copy link

Yeah, I think that's OK -- it's more important to me that something fails over to generic rather than specific. So if I have to CAST (blah) AS INTEGER, that's no big deal.

Supporting .CSVT might be a nice feature as well, but would not really solve my particular problem, which I think I'd come come across more often if my main use is to quickly start playing with some data.

In any case, csvsql is really cool. Navicat is obviously good at CSV imports, but still requires some configuration guesswork. It's a huge timesaver and probably nearly eliminates the need for certain types of users in our organization to even use Navicat, which would definitely save us some money.

@onyxfish
Copy link
Collaborator Author

onyxfish commented Mar 1, 2012

That's wonderful to hear. I'll look at hacking in a way of force values to strings sometime soon (possible tonight, though I'm down other rabbit holes at the moment). Thanks for the feedback!

@mikejcorey
Copy link

Great, thanks! No rush from me, just wanted to say something while I was thinking of it.

@jpmckinney
Copy link
Member

Noting that there's some discussion of possible solutions in the referenced issues above.

@onyxfish onyxfish changed the title Allow overriding field types in csvsql, csvstat Allow disabling/forcing type inference for certain columns only Dec 29, 2016
@jpmckinney
Copy link
Member

So I think the simplest satisfactory solution for the reported feature request is to allow --no-inference to accept column names, e.g.:

--no-inference a,b,c

@mingfang
Copy link

mingfang commented Jun 27, 2022

is there a realistic plan to do this?

@jpmckinney
Copy link
Member

There is no time planned to work on this issue. It remains open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants