-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] pivot #2183
[FEAT] pivot #2183
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2183 +/- ##
==========================================
+ Coverage 85.10% 85.56% +0.45%
==========================================
Files 68 71 +3
Lines 7391 7549 +158
==========================================
+ Hits 6290 6459 +169
+ Misses 1101 1090 -11
|
daft/dataframe/dataframe.py
Outdated
|
||
Example: | ||
>>> df = daft.from_pydict( | ||
{"group": ["A", "A", "B", "B"], "pivot": [1, 1, 1, 2], "value": [1, 2, 3, 4]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a better motivating example? Something simple maybe like:
"version": [...],
"benchmark_name": [...],
"walltime": [...],
df.pivot("version", "benchmark_name", "walltime", "min")
None, | ||
)) | ||
} | ||
_ => unreachable!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a message here, otherwise when we panic we won't be able to figure out what happened without a backtrace.
@@ -48,22 +48,6 @@ impl Aggregate { | |||
}) | |||
} | |||
|
|||
pub(crate) fn schema(&self) -> SchemaRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema for an agg op is already created in the constructor , so this function is redundant. I believe it is leftover code from a previous change, and I'm removing this as a drive by. (I found this cuz I was modeling the pivot logical op after this)
tests/dataframe/test_pivot.py
Outdated
import pytest | ||
|
||
|
||
@pytest.mark.parametrize("repartition_nparts", [1, 2, 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try [1, 2, 5]
because the 5
will also test empty partitions
src/daft-table/src/ops/pivot.rs
Outdated
}; | ||
let pivot_names_as_series = pivot_keys_series.to_str_values()?; | ||
let pivot_names = pivot_names_as_series.utf8()?.as_arrow(); | ||
if pivot_names.len() > names.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need a strong guarantee here that Set(pivot_names).in(Set(names))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, for example in an empty partition, pivot_names will be empty, but we still need to create empty columns for these names otherwise the schema won't match with the other non-empty partitions.
src/daft-table/src/ops/pivot.rs
Outdated
))); | ||
} | ||
|
||
let mut pivot_name_to_pivot_key_idx = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably make this a HashMap or something
@@ -744,6 +700,139 @@ pub(super) fn translate_single_logical_node( | |||
} | |||
} | |||
|
|||
fn populate_aggregation_stages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add function docstrings to explain what it does and remove mut
let pivoted = empty_table.pivot(group_by, pivot_col, values_col, names)?; | ||
Ok(MicroPartition::new_loaded( | ||
pivoted.schema.clone(), | ||
vec![pivoted].into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just vec![]
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smart
Pivot with single groupby (aka index column), pivot column, and values column.
Todo: