DataArrays lead to TypeError in the Table constructor #87

sylvaticus · 2017-10-16T14:47:55Z

Hello,
I am trying to convert a DataFrame with NA values as IndexedTable, e.g.:

using DataFrame
df = DataFrame(
  param  = ["price","price","price","price","waterContent","waterContent"],
  item   = ["banana","banana","apple","apple","banana", "apple"],
  region = ["FR","UK","FR","UK","",""],
  value  = [3.2,2.9,1.2,0.8,0.2,0.8]
)
df[5,:region] = NA
df[6,:region] = NA

6×4 DataFrames.DataFrame
│ Row │ param          │ item     │ region │ value │
├─────┼────────────────┼──────────┼────────┼───────┤
│ 1   │ "price"        │ "banana" │ "FR"   │ 3.2   │
│ 2   │ "price"        │ "banana" │ "UK"   │ 2.9   │
│ 3   │ "price"        │ "apple"  │ "FR"   │ 1.2   │
│ 4   │ "price"        │ "apple"  │ "UK"   │ 0.8   │
│ 5   │ "waterContent" │ "banana" │ NA     │ 0.2   │
│ 6   │ "waterContent" │ "apple"  │ NA     │ 0.8   │

I did try to construct the Table from vectors where there are NA values, but the constructor fails:

using DataFrames, IndexedTables, IndexedTables.Table
a = Union{String,DataArrays.NAtype}["aa","bb","aa","bb",NA,NA]
b = Union{String,DataArrays.NAtype}["c","c","d","d","c","d"]
v = Float64[1.0,2.0,3.0,4.0,5.0,6.0]
t = IndexedTables.Table(Columns(a=a,b=b),v)
TypeError: non-boolean (DataArrays.NAtype) used in boolean context

However if I add NA values once the IndexedTable is created, it works great.
So, in split of possible performance problems, I thought of first creating an empty IndexedTable, and then "append" to it, but it seems that when you construct an empty IndexedTable, this object misbehaves:

t = Table(Columns(a=Int64[],b=String[]),Float64[])
t[1,"wewe"] = 1.0
t[2,"wewe"] = 2
Error showing value of type IndexedTables.IndexedTable{Float64,Tuple{Int64,String},IndexedTables.Columns{NamedTuples._NT_a_b{Int64,String},NamedTuples._NT_a_b{Array{Int64,1},Array{String,1}}},Array{Float64,1}}:
ERROR: BoundsError: attempt to access 0-element Array{Int64,1} at index [0]

So, which is the preferred way to deal with IndexedTables when some dimensions may present NA values ?

The text was updated successfully, but these errors were encountered:

sylvaticus · 2017-10-20T06:52:12Z

Note that the problem regards only the first column, i.e. this work:

a = Union{String,DataArrays.NAtype}["aa","bb","aa","bb",NA,NA]
b = Union{String,DataArrays.NAtype}["c","c","d","d","c","d"]
v = Float64[1.0,2.0,3.0,4.0,5.0,6.0]
t = IndexedTables.Table(Columns(a=b,b=a),v)

YongHee-Kim · 2017-12-12T02:29:24Z

There are some odd behavior with NA value

using DataArrays, JuliaDB

a = @data([NA,1,2,3])
t1 = table(a, names=[:a])
t1[1] # MethodError
rows(t1) # Error displaying

But it works fine if column name is not defined

t2 = table(a)
t2[1] 
rows(t2)

shashi · 2017-12-12T07:10:19Z

This is because of:

julia> eltype(DataArrays.DataArray{Int64, 1})
Int64

It should really be Union{Int64, DataArrays.NAType}...

shashi · 2017-12-12T07:13:18Z

Some options here:

convert the columns into a DataValueArray see https://github.com/davidanthoff/DataValueArrays.jl#constructors (this type is now exposed by DataValues.jl -- using DataValues should help)
contruct the table manually using columns.

I do wish this just worked.

YongHee-Kim · 2017-12-12T07:34:56Z

So NA will be deprecated from Julia, I guess I should learn to deal with new null type.
Since missing is used by DataFrame 0.11 and will be in a base. I guess It's better to use Missings instead of DataValues.
converting the columns into a Union{Missing, T} seems to be working. I will try this approach.

Thanks for your help!

shashi · 2017-12-12T08:21:22Z

Yeah, that will work too. As long as the array type doesn't lie about its element type, we're in business. We will be switching JuliaDB over to missing only in the 0.7-compatible release once 0.7 alpha is released. This is because performance would suffer on 0.6.

andreasnoack · 2017-12-12T09:28:25Z

Just a minor comment here. So actually the issue with eltype for DataArrays has been fixed in version 0.7.0 but unfortunately, it will probably be difficult to ever install that version because a lot of upper bounds have been added to other packages.

andreasnoack · 2017-12-12T10:56:42Z

@YongHee-Kim You might be able to see why DataArrays isn't updated by executing Pkg.update("DataArrays").

YongHee-Kim · 2017-12-13T01:05:49Z

@andreasnoack Thank you for the tip! It seems ExcelReaders is holding DataArrays update.
I've created a issue on ExcelReaders

davidanthoff · 2017-12-16T17:42:59Z

This should work smoothly by just loading IterableTables.jl and then doing something like this:

using DataFrames, IndexedTables, IterableTables

df = DataFrame(
  param  = ["price","price","price","price","waterContent","waterContent"],
  item   = ["banana","banana","apple","apple","banana", "apple"],
  region = ["FR","UK","FR","UK","",""],
  value  = [3.2,2.9,1.2,0.8,0.2,0.8]
)
df[5,:region] = NA
df[6,:region] = NA

it = table(df)

IterableTables.jl will handle all the various different representations of missing data that float around. In general when things get converted via that route I just respect what a given container sees as its default representation for missing data, and then convert accordingly.

Only caveat is that I haven't merged the support for DataFrames.jl v0.11 yet, but that should happen relatively soon (and then it will work with both the old and new DataFrames at the same time).

sylvaticus changed the title ~~How to deal with NA values is some dimensions?~~ How to deal with NA values in some dimensions? Oct 16, 2017

sylvaticus changed the title ~~How to deal with NA values in some dimensions?~~ How to deal with null values in some dimensions? Oct 16, 2017

sylvaticus changed the title ~~How to deal with null values in some dimensions?~~ Null values in the first column lead to TypeError in the Table constructor Oct 20, 2017

shashi changed the title ~~Null values in the first column lead to TypeError in the Table constructor~~ DataArrays lead to TypeError in the Table constructor Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataArrays lead to TypeError in the Table constructor #87

DataArrays lead to TypeError in the Table constructor #87

sylvaticus commented Oct 16, 2017 •

edited

Loading

sylvaticus commented Oct 20, 2017

YongHee-Kim commented Dec 12, 2017 •

edited

Loading

shashi commented Dec 12, 2017

shashi commented Dec 12, 2017

YongHee-Kim commented Dec 12, 2017 •

edited

Loading

shashi commented Dec 12, 2017

andreasnoack commented Dec 12, 2017

andreasnoack commented Dec 12, 2017

YongHee-Kim commented Dec 13, 2017 •

edited

Loading

davidanthoff commented Dec 16, 2017

DataArrays lead to TypeError in the Table constructor #87

DataArrays lead to TypeError in the Table constructor #87

Comments

sylvaticus commented Oct 16, 2017 • edited Loading

sylvaticus commented Oct 20, 2017

YongHee-Kim commented Dec 12, 2017 • edited Loading

shashi commented Dec 12, 2017

shashi commented Dec 12, 2017

YongHee-Kim commented Dec 12, 2017 • edited Loading

shashi commented Dec 12, 2017

andreasnoack commented Dec 12, 2017

andreasnoack commented Dec 12, 2017

YongHee-Kim commented Dec 13, 2017 • edited Loading

davidanthoff commented Dec 16, 2017

sylvaticus commented Oct 16, 2017 •

edited

Loading

YongHee-Kim commented Dec 12, 2017 •

edited

Loading

YongHee-Kim commented Dec 12, 2017 •

edited

Loading

YongHee-Kim commented Dec 13, 2017 •

edited

Loading