Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Group data frames to accept string or symbol as index when grouping is by a single attribute #3470

Open
alex-s-gardner opened this issue Sep 28, 2024 · 5 comments
Labels
Milestone

Comments

@alex-s-gardner
Copy link

#Make example grouped data frame

using DataFrames
df=DataFrame(city=["Paris", "London", "Paris", "Berlin", "London", "Berlin", "Berlin"],  
             date= ["10-1$k-2021"  for k in 3:9],
             v=38 .+100*rand(7))
gdf = DataFrames.groupby(df, :city)

this is how the data frame needs to be accessed now:

gdf[("Berlin", )]

for a data frame grouped by a single attribute it would be more intuitive to simply index in the same was as a data frame column

gdf["Berlin"] == gdf[:Berlin] ==  gdf[("Berlin", )]
@bkamins bkamins added this to the 1.x milestone Sep 29, 2024
@bkamins
Copy link
Member

bkamins commented Sep 29, 2024

The reason why this is not allowed is that gdf[1] would be ambiguous as it could mean:

  1. Selecting the first group.
  2. Selecting the group with key equal to 1.

In the past we discussed allowing gdf(1) (and by extension e.g. gdf(1,2,3) for multiple grouping keys) to make this case easier, but it did not get much support. But maybe we can reconsider it.

CC @nalimilan @pdeffebach @kdpsingh

@alex-s-gardner
Copy link
Author

Ahh.. I can see the challenge. DimensionalData.jl uses At(1)... such that gdf[1] is the first group and gdf[At(1)] is the group with key value of 1... though I'm not sure how simpatico that is with DataFrames

@bkamins
Copy link
Member

bkamins commented Sep 29, 2024

We could use At (or other such wrapper), but in this case it is longer to write At(1) than (1,). And this was a consideration why we did not introduce it yet.

@alex-s-gardner
Copy link
Author

At(1) is longer but more intuitive than (1,), but I certainly see your point. gdf(1) seems like a good compromise

@alex-s-gardner
Copy link
Author

alex-s-gardner commented Oct 21, 2024

Thinking about this more gdf[At("Berlin")] produces more readable/understandable code as it retains the indexing brackets [] and then specifies what is being requested. gdf("Berlin") might be a bit ambiguous as it suggests a function call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants