Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another attempt at an astable flag #298

Merged
merged 29 commits into from
Sep 24, 2021
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a8701c8
initial attempt
pdeffebach Sep 14, 2021
9b997a6
finally working
pdeffebach Sep 15, 2021
d639560
start adding tests
pdeffebach Sep 15, 2021
b77e8ca
more tests
pdeffebach Sep 16, 2021
3cdf0d5
more tests
pdeffebach Sep 16, 2021
b878fbb
add docstring
pdeffebach Sep 16, 2021
2344a2e
tests pass
pdeffebach Sep 16, 2021
6557def
add ByRow in docstring
pdeffebach Sep 16, 2021
6002def
add type annotation
pdeffebach Sep 21, 2021
08a1c4b
better docs
pdeffebach Sep 21, 2021
581b2cf
more docs fixes
pdeffebach Sep 21, 2021
7cc8947
update index.md
pdeffebach Sep 21, 2021
0eca67d
Apply suggestions from code review
pdeffebach Sep 21, 2021
a4ab9a6
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 21, 2021
ab9bae4
clean named tuple creation
pdeffebach Sep 22, 2021
495f08a
add example with string
pdeffebach Sep 22, 2021
01cb5e7
grouping tests
pdeffebach Sep 22, 2021
01fb3b7
Update src/macros.jl
pdeffebach Sep 22, 2021
915191c
changes
pdeffebach Sep 23, 2021
a331fc2
Merge branch 'astable_2' of https://github.com/pdeffebach/DataFramesM…
pdeffebach Sep 23, 2021
2ce4d9e
fix some errors
pdeffebach Sep 23, 2021
57b4051
add macro check
pdeffebach Sep 23, 2021
da7674d
add errors for bad flag combo
pdeffebach Sep 23, 2021
285e3ac
better grouping tests
pdeffebach Sep 23, 2021
713eaf0
Update src/parsing_astable.jl
pdeffebach Sep 23, 2021
4e01c4a
add snipper to transform, select, combine, by
pdeffebach Sep 23, 2021
09c692a
add mutating tests
pdeffebach Sep 23, 2021
ae26da8
get rid of debugging printin
pdeffebach Sep 24, 2021
a7fd1a2
Apply suggestions from code review
pdeffebach Sep 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ version = "0.9.1"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"

[compat]
Chain = "0.4"
DataFrames = "1"
MacroTools = "0.5"
Reexport = "0.2, 1"
julia = "1"
Chain = "0.4"

[extras]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
31 changes: 29 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ In addition, DataFramesMeta provides
convenient syntax.
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros).
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations.
* `@astable` to create multiple columns within a single transformation.
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s
`%>%` in R.

Expand Down Expand Up @@ -396,11 +397,37 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str)
3 │ missing missing
```

## Creating multiple columns at once with `@astable`

Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple
new variables in the same operation, yet have them share
information.

In a single block, all assignments of the form `:y = f(:x)`
or `$y = f(:x)` at the top-level are generate new columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
or `$y = f(:x)` at the top-level are generate new columns.
or `$y = f(:x)` at the top-level generate new columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add what $y has to resolve to (I understand it has to be Symbol, or strings are also accepted?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Turns out I was allowing unexpected behavior and patched the code.


```
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]);

julia> @transform df @astable begin
ex = extrema(:b)
:b_first = :b .- first(ex)
:b_last = :b .- last(ex)
end
3×4 DataFrame
Row │ a b b_first b_last
│ Int64 Int64 Int64 Int64
─────┼───────────────────────────────
1 │ 1 400 0 -200
2 │ 2 500 100 -100
3 │ 3 600 200 0
```


## [Working with column names programmatically with `$`](@id dollar)

DataFramesMeta provides the special syntax `$` for referring to
columns in a data frame via a `Symbol`, string, or column position as either
a literal or a variable.
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it given our recent discussion on Discourse, I think it is essential to mention when the $ reference is resolved.
Also maybe add an example when macros are used within a function? I think these are cases not trivial. This can be another PR of course

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this as another PR. In summary, you can't use other macros which use $. I will try and sort out if I can carve out an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear why I stress it so much. With DataFrames.jl my answer to users is: if you learn Julia Base then you will know exactly how DataFrames.jl works. With DataFramesMeta.jl unfortunately this is not the case as it is a DSL so we need to be very precise how things work in documentation.


```julia
df = DataFrame(A = 1:3, B = [2, 1, 2])
Expand Down
5 changes: 4 additions & 1 deletion src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ using Reexport

using MacroTools

using OrderedCollections: OrderedCollections

@reexport using DataFrames

@reexport using Chain
Expand All @@ -16,12 +18,13 @@ export @with,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@eachrow, @eachrow!,
@byrow, @passmissing,
@byrow, @passmissing, @astable,
@based_on, @where # deprecated

const DOLLAR = raw"$"

include("parsing.jl")
include("parsing_astable.jl")
include("macros.jl")
include("linqmacro.jl")
include("eachrow.jl")
Expand Down
135 changes: 114 additions & 21 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,120 @@ macro passmissing(args...)
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros."))
end

"""
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
astable(args...)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved

Return a `NamedTuple` from a single transformation inside DataFramesMeta.jl macros.

`@astable` acts on a single block. It works through all top-level expressions
and collects all such expressions of the form `:y = ...`, i.e. assignments to a
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
`Symbol`, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the
expression, all assignments are collected into a `NamedTuple` to be used
with the `AsTable` destination in the DataFrames.jl transformation
mini-language.

Concretely, the expressions

```
df = DataFrame(a = 1)

@rtransform df @astable begin
:x = 1
y = 50
:z = :x + y + :a
end
```

become the pair

```
function f(a)
x_t = 1
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
y = 50
z_t = x_t + y + a

(; x = x_t, z = z_t)
end

transform(df, [:a] => ByRow(f) => AsTable)
```

`@astable` has two major advantages at the cost of increasing complexity.
First, `@astable` makes it easy to create multiple columns from a single
transformation, which share a scope. For example, `@astable` allows
for the following

```
@transform df @astable begin
m = mean(:x)
:x_demeaned = :x .- m
:x2_demeaned = :x2 .- m
end
```

The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`,
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
which does not need to be calculated twice.

Second, `@astable` is useful when performing intermediate calculations
and storing their results in new columns. For example, the following fails.

```
@rtransform df begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
```

This because DataFrames.jl does not guarantee sequential evaluation of
transformations. `@astable` solves this problem

@rtransform df @astable begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end

Column assignment in `@astable` follows the same rules as
column assignment more generally. Construct a new column
from a string by escaping it with `$DOLLAR`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an example of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.


### Examples

```
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);

julia> d = @rtransform df @astable begin
:x = 1
y = 5
:z = :x + y
end
3×4 DataFrame
Row │ a b x z
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 4 1 6
2 │ 2 5 1 6
3 │ 3 6 1 6

julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);

julia> @by df :a @astable begin
ex = extrema(:b)
:min_b = first(ex)
:max_b = last(ex)
end
2×3 DataFrame
Row │ a min_b max_b
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 6
2 │ 2 70 80
```

"""
macro astable(args...)
throw(ArgumentError("@astable only works inside DataFramesMeta macros."))
end

##############################################################################
##
## @with
Expand Down Expand Up @@ -1546,17 +1660,6 @@ function combine_helper(x, args...; deprecation_warning = false)

exprs, outer_flags = create_args_vector(args...)

fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

quote
Expand Down Expand Up @@ -1666,16 +1769,6 @@ end
function by_helper(x, what, args...)
# Only allow one argument when returning a Table object
exprs, outer_flags = create_args_vector(args...)
fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

Expand Down
13 changes: 10 additions & 3 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ is_macro_head(ex::Expr, name) = ex.head == :macrocall && ex.args[1] == Symbol(na

const BYROW_SYM = Symbol("@byrow")
const PASSMISSING_SYM = Symbol("@passmissing")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false))
const ASTABLE_SYM = Symbol("@astable")
const DEFAULT_FLAGS = (;BYROW_SYM => Ref(false), PASSMISSING_SYM => Ref(false), ASTABLE_SYM => Ref(false))

extract_macro_flags(ex, exprflags = deepcopy(DEFAULT_FLAGS)) = (ex, exprflags)
function extract_macro_flags(ex::Expr, exprflags = deepcopy(DEFAULT_FLAGS))
Expand Down Expand Up @@ -269,7 +270,13 @@ function fun_to_vec(ex::Expr;
return ex_col
end

if no_dest
if final_flags[ASTABLE_SYM][]
src, fun = get_source_fun_astable(ex; exprflags = final_flags)

return :($src => $fun => AsTable)
end

if no_dest # subset and with
src, fun = get_source_fun(ex, exprflags = final_flags)
return quote
$src => $fun
Expand Down Expand Up @@ -359,7 +366,7 @@ function create_args_vector(arg; wrap_byrow::Bool=false)
outer_flags[BYROW_SYM][] = true
end

if arg isa Expr && arg.head == :block
if arg isa Expr && arg.head == :block && !outer_flags[ASTABLE_SYM][]
x = MacroTools.rmlines(arg).args
else
x = Any[arg]
Expand Down
102 changes: 102 additions & 0 deletions src/parsing_astable.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
function conditionally_add_symbols!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, col)
# if it's already been assigned at top-level,
# don't add it to the inputs
if haskey(lhs_assignments, col)
return lhs_assignments[col]
else
return addkey!(inputs_to_function, col)
end
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
end

replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, x) = x
replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, q::QuoteNode) =
conditionally_add_symbols!(inputs_to_function, lhs_assignments, q)

function replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e::Expr)
if onearg(e, :^)
return e.args[2]
end

col = get_column_expr(e)
if col !== nothing
return conditionally_add_symbols!(inputs_to_function, lhs_assignments, col)
elseif e.head == :.
return replace_dotted_astable!(inputs_to_function, lhs_assignments, e)
else
return mapexpr(x -> replace_syms_astable!(inputs_to_function, lhs_assignments, x), e)
end
end

protect_replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e) = e
protect_replace_syms_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e::Expr) =
replace_syms!(inputs_to_function, lhs_assignments, e)

function replace_dotted_astable!(inputs_to_function::AbstractDict,
lhs_assignments::OrderedCollections.OrderedDict, e)
x_new = replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[1])
y_new = protect_replace_syms_astable!(inputs_to_function, lhs_assignments, e.args[2])
Expr(:., x_new, y_new)
end

is_column_assigment(ex) = false
function is_column_assigment(ex::Expr)
ex.head == :(=) && (get_column_expr(ex.args[1]) !== nothing)
end

# Taken from MacroTools.jl
# No docstring so assumed unstable
block(ex) = isexpr(ex, :block) ? ex : :($ex;)

function get_source_fun_astable(ex; exprflags = deepcopy(DEFAULT_FLAGS))
inputs_to_function = Dict{Any, Symbol}()
lhs_assignments = OrderedCollections.OrderedDict{Any, Symbol}()

# Make sure all top-level assignments are
# in the args vector
ex = block(MacroTools.flatten(ex))
exprs = map(ex.args) do arg
if is_column_assigment(arg)
lhs = get_column_expr(arg.args[1])
rhs = arg.args[2]
new_ex = replace_syms_astable!(inputs_to_function, lhs_assignments, arg.args[2])
if haskey(inputs_to_function, lhs)
new_lhs = inputs_to_function[lhs]
lhs_assignments[lhs] = new_lhs
else
new_lhs = addkey!(lhs_assignments, lhs)
pdeffebach marked this conversation as resolved.
Show resolved Hide resolved
end

Expr(:(=), new_lhs, new_ex)
else
replace_syms_astable!(inputs_to_function, lhs_assignments, arg)
end
end
source = :(DataFramesMeta.make_source_concrete($(Expr(:vect, keys(inputs_to_function)...))))

inputargs = Expr(:tuple, values(inputs_to_function)...)
nt_iterator = (:(Symbol($k) => $v) for (k, v) in lhs_assignments)
nt_expr = Expr(:tuple, Expr(:parameters, nt_iterator...))
body = Expr(:block, Expr(:block, exprs...), nt_expr)

fun = quote
$inputargs -> begin
$body
end
end

# TODO: Add passmissing support by
# checking if any input arguments missing,
# and if-so, making a named tuple with
# missing values
if exprflags[BYROW_SYM][]
fun = :(ByRow($fun))
end

return source, fun
end
Loading