-
Notifications
You must be signed in to change notification settings - Fork 50
Operators for PooledDataArray? #67
Comments
I'm on board with using custom implementations of every function for both DA's and PDA's since I don't see any new AbstractDataArray types coming up soon. |
Should e.g. |
That's a really good question. I'd say those things should raise errors: once you assert your data is categorical, we turn arithmetic off. |
I can see a legitimate use case for |
What's the legitimate use case for |
Well, for ordinal data in some cases this can make sense (think of a Likert scale). But I think it would be safer to require people to convert to integers explicitly. If you start implementing |
Yes, I think this boils down to a question of what PooledDataArrays represent. Here are four possible answers:
We should not worry about the cost of supporting operators for PooledDataArrays, since we have to support all operators for DataArrays anyway, and metaprogramming makes supporting all operators for PooledDataArrays almost as easy as supporting one. |
Yeah, we really need to decide what are PDAs. Regarding ordinal scales, computing Spearman's rank correlation coefficient does not imply you are able to attribute a precise numeric value to a level: just that you know their order. So Anyway, I was not suggesting the problem was the cost of implementing operators, rather that you need to draw all the logical implications of adding |
As I said in another issue, I now quite strongly think that we should use an Enum-like type for processing categorical and ordinal data, then store those values in DataArray's. PDA's are an interesting idea, but not very valuable in the long run because of the existence of true scalars in Julia. The analogies with R that inspired PDA's were helpful, but inexact and shouldn't be part of our long-term strategy. |
So that means PDAs would become an enum for ordinal/nominal variables, and so |
Would this mean that what are now PooledDataArrays could become DataArrays of Enums, and we could kill AbstractDataArrays entirely? I'd be pretty happy with that, and I think it could simplify many things. Another question this discussion brings up is whether we want a separate wrapper for ordinal types. My point about Spearman's rho was that regardless of whether the original data was ordinal or interval scaled, the ranks are ordinal, but we would need to be able to perform |
PDA's will just go away in my ideal world. For any specific categorical variable, there would be a custom type, which could be stored in Array's or DataArray's. I would oppose implementing If you want to use integers, but only a small number of them, use |
Sounds good to me. |
Yes, we'd only have DataArrays of Enums. I think the Enum's we'd want would be either descendants of NominalVariable or OrdinalVariable. For calculating Spearman's rho, you map the ordered elements to the integers and then do arithmetic on the integers. So we don't need to implement anything more than the map from elements to integers. |
If we're getting rid of PooledDataArrays, we obviously don't need operators for them, so I'm going to close this and open a new issue. |
For unary operators, binary operators with scalar aguments, and some others (e.g.
transpose
, which I'm working on now), we could make specialized versions that operate substantially faster on PooledDataArray than the current implementations for AbstractDataArray. My questions are:The text was updated successfully, but these errors were encountered: