-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrameColumns support for sorting #3139
Comments
DataFrames.jl does not define
Yes, I think we could. It requires consideration of what you mention. How is "permutation vector" defined is a part of the issue. The other is what functions we need to define for consistency (e.g. CC @nalimilan for opinion |
Interestingly, it seems that |
This is the point - adding methods from Base Julia should not be taken lightly and requires careful design to make sure all is consistent. This is exactly the reason why |
Upon thinking about it more, I also noticed that
If there were sorting support in Base for This could definitely be nice in other contexts in Base as well, like trying to return a sorted iterator over key,value pairs for a standard If you think this is a good idea, I could remove this issue and create a new one over in base to discuss adding support for sorting pairs more generally speaking. |
I suspect a So we will likely have to define our own sorting methods for That said, it's clear we have an inconsistency in the API. Apart from
Maybe not a big deal in practice though. We have a similar issue as NamedArrays.jl/AxisArrays.jl/etc., |
I don't think it would be quite identical to I will say I would find it much more ergonomic to be able to get the top few keys of a DataFrame without having to index into them with a vector of indices; if you think that |
To derail a bit from the main issue. No one previously requested this functionality. What are the use-cases/examples where this is useful in practice? Thank you! |
Point taken :) My use case is I am implementing some voting algorithms; a column header would be the name of a political candidate or party, and each row is one ballot. To give you an example, you could check out my (probably very rough) Julia implementation here https://electowiki.org/wiki/Threshold_Equal_Approval. Notice my uses of |
It is not a problem. We want to support various use-cases. It is just that this one is so rare that it was not requested before. Also note, that in general you can e.g. do
and you have a data frame ordered the way you want (if I understand your use-case correctly). |
…/`argmax` (#46705) This is the sentence used for `find*` functions, introduced by #25577. Also change "the domain of `f`" to "`domain`" as the domain of `f` can be a superset of the passed `domain`. (Spotted at JuliaData/DataFrames.jl#3139.) --------- Co-authored-by: Jameson Nash <[email protected]> Co-authored-by: Lilith Orion Hafner <[email protected]>
I found myself wanting to find the keys of the DataFrame, ordered by some comparator of their columns. As it stands, it seems that the most straightforward way to do this is
I am wondering if we could define a
sortperm
(and I suppose all related sorting functions) directly on aDataFrameColumns
object? It seems like it might be able to write more simplysortperm(eachcol(df))
Now a question arises, since
DataFrameColumns
is in the uncommon scenario of being accessible by both indices and keys, is shouldsortperm
return a permutation of indices or of keys? In base Julia, it seems sorting functions are defined onAbstractVector
, so it is usually assumed that this will return indices.However. Looking at the docstring for e.g.
sortperm
, it says the promise is thatsortperm
returns "a permutation vectorI
that putsv[I]
in sorted order."Say
keys(eachcol(df))
is[:a, :b, :c]
, then sinceeachcol(df)[:b, :c, :a]
returns a DataFrameColumns object with the columns in that order of[:b, :c, :a]
, I think we can consider that a permutation vectorI
. Furthermore, since bothfindmax(eachcol(df))
andargmax(eachcol(df))
, the latter of which is basically an alias for the former, return a key---the underlying implementation will sort overpairs(eachcol(df))
---I think that is another strong reason thatsortperm
should also return a vector of keys, to retain the parity thatargmax(eachcol(df)) == first(sortperm(eachcol(df)))
That all is to say, I would love to see added functionality as small as the following:
sortperm(dfc::DataFrameColumns) = keys(dfc)[sortperm(collect(dfc))]
Syntactic sugar for the existing pattern I am using.
The text was updated successfully, but these errors were encountered: