Allow disabling/forcing type inference for certain columns only #151

onyxfish · 2012-03-01T01:17:18Z

A mapping?

--types int,varchar

A .csvt?

mikejcorey · 2012-03-01T01:20:32Z

A CSVT is might be a little exotic, but might be most robust solution if you do the same tasks over and over. The example I had so far was that csvsql made something a date field that was a varchar, and I couldn't really get it to do what I wanted.

Could you just specify which fields to not guess, defaulting to varchar?

CSVT would also make you specify all columns, right? That would be daunting on a big dataset, probably defeat the purpose.

I think I like the "Don't guess on this column" option most.

onyxfish · 2012-03-01T01:24:01Z

The latter is certainly a possibility although I'm inclined to implement a more general solution if one exists. I like .csvt because 1) it's still CSV and 2) it's an existing (albeit, as you say, exotic) convention. The somewhat annoying thing about it is that I'll be mandating a pretty specific list of supported Python types, which aren't going to match any other type system out there in the world.

Internally csvkit normalizes to:

NoneType, bool, int, float, datetime.datetime, datetime.time, datetime.date and unicode

mikejcorey · 2012-03-01T01:25:47Z

Am I right that you'd have to specify all columns if you went the .csvt route?

onyxfish · 2012-03-01T01:27:39Z

That's true, that is def. a downside. Maybe a

--no-infer a,b,c

syntax is better after all.

It's also worth keeping in mind that for type coercion things can really only be cast "down", i.e. int -> unicode. If you were to try to use a csvt to specify a more granular type the thing would just blow up anyway.

mikejcorey · 2012-03-01T01:31:44Z

Yeah, I think that's OK -- it's more important to me that something fails over to generic rather than specific. So if I have to CAST (blah) AS INTEGER, that's no big deal.

Supporting .CSVT might be a nice feature as well, but would not really solve my particular problem, which I think I'd come come across more often if my main use is to quickly start playing with some data.

In any case, csvsql is really cool. Navicat is obviously good at CSV imports, but still requires some configuration guesswork. It's a huge timesaver and probably nearly eliminates the need for certain types of users in our organization to even use Navicat, which would definitely save us some money.

onyxfish · 2012-03-01T01:36:59Z

That's wonderful to hear. I'll look at hacking in a way of force values to strings sometime soon (possible tonight, though I'm down other rabbit holes at the moment). Thanks for the feedback!

mikejcorey · 2012-03-01T02:13:33Z

Great, thanks! No rush from me, just wanted to say something while I was thinking of it.

jpmckinney · 2016-01-25T17:20:28Z

Noting that there's some discussion of possible solutions in the referenced issues above.

jpmckinney · 2017-01-28T18:39:18Z

So I think the simplest satisfactory solution for the reported feature request is to allow --no-inference to accept column names, e.g.:

--no-inference a,b,c

. Closes wireservice#151.

mingfang · 2022-06-27T16:18:33Z

is there a realistic plan to do this?

jpmckinney · 2022-06-28T14:24:12Z

There is no time planned to work on this issue. It remains open.

ethanpooley mentioned this issue Dec 13, 2015

csvsql recognizing different column/variable names as different data types across a set of files #470

Closed

jpmckinney changed the title ~~Allow overriding field types in csvsql~~ Allow overriding field types in csvsql, csvstat Jan 25, 2016

This was referenced Jan 25, 2016

Feature request: provide csvstat with an option to disable or tune data type guessing #231

Closed

csvsql: automatic setting of boolean datatype by cell content doesnt always make sense...switch option? #233

Closed

jpmckinney mentioned this issue Jan 25, 2016

csvsql: Type inference desired for some columns, not others. #309

Closed

onyxfish changed the title ~~Allow overriding field types in csvsql, csvstat~~ Allow disabling/forcing type inference for certain columns only Dec 29, 2016

jpmckinney mentioned this issue Jan 15, 2018

Document using --datetime-format to avoid over-aggressive date inference #917

Closed

jpmckinney added this to the 1.0.4 milestone May 21, 2018

jpmckinney mentioned this issue Feb 10, 2019

Loading columns with leading 0's drops data, misinterprets data type #977

Closed

jpmckinney mentioned this issue Jul 7, 2020

csvsql: Opt-in to use INTEGER instead of DECIMAL #1070

Open

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020

Percent and indexed change examples for cookbook. Closes wireservice#150

0d4204c

. Closes wireservice#151.

jpmckinney mentioned this issue Jun 11, 2021

Numerical csvsort with --no-inference #1125

Closed

jpmckinney added framework and removed Low Priority labels Oct 17, 2023

jpmckinney modified the milestones: Next version, Priority Oct 17, 2023

jpmckinney mentioned this issue Nov 13, 2023

Would like to be able to force the type of columns in csv file when using csvsql #1220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow disabling/forcing type inference for certain columns only #151

Allow disabling/forcing type inference for certain columns only #151

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

jpmckinney commented Jan 25, 2016

jpmckinney commented Jan 28, 2017

mingfang commented Jun 27, 2022 •

edited

Loading

jpmckinney commented Jun 28, 2022

Allow disabling/forcing type inference for certain columns only #151

Allow disabling/forcing type inference for certain columns only #151

Comments

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

onyxfish commented Mar 1, 2012

mikejcorey commented Mar 1, 2012

jpmckinney commented Jan 25, 2016

jpmckinney commented Jan 28, 2017

mingfang commented Jun 27, 2022 • edited Loading

jpmckinney commented Jun 28, 2022

mingfang commented Jun 27, 2022 •

edited

Loading