Skip to content

Commit

Permalink
Another attempt at an astable flag (#298)
Browse files Browse the repository at this point in the history
* initial attempt

* finally working

* start adding tests

* more tests

* more tests

* add docstring

* tests pass

* add ByRow in docstring

* add type annotation

* better docs

* more docs fixes

* update index.md

* Apply suggestions from code review

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* clean named tuple creation

* add example with string

* grouping tests

* Update src/macros.jl

Co-authored-by: Bogumił Kamiński <[email protected]>

* changes

* fix some errors

* add macro check

* add errors for bad flag combo

* better grouping tests

* Update src/parsing_astable.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

* add snipper to transform, select, combine, by

* add mutating tests

* get rid of debugging printin

* Apply suggestions from code review

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Bogumił Kamiński <[email protected]>
  • Loading branch information
3 people authored Sep 24, 2021
1 parent 6ba85a7 commit cc066df
Show file tree
Hide file tree
Showing 11 changed files with 539 additions and 43 deletions.
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ version = "0.9.1"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"

[compat]
Chain = "0.4"
DataFrames = "1"
MacroTools = "0.5"
Reexport = "0.2, 1"
julia = "1"
Chain = "0.4"

[extras]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down
32 changes: 30 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ In addition, DataFramesMeta provides
convenient syntax.
* `@byrow` for applying functions to each row of a data frame (only supported inside other macros).
* `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations.
* `@astable` to create multiple columns within a single transformation.
* `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s
`%>%` in R.

Expand Down Expand Up @@ -396,11 +397,38 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str)
3missing missing
```

## Creating multiple columns at once with `@astable`

Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple
new variables in the same operation, yet have them share
information.

In a single block, all assignments of the form `:y = f(:x)`
or `$y = f(:x)` at the top-level generate new columns. In the second example, `y`
must be a string or `Symbol`.

```
julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]);
julia> @transform df @astable begin
ex = extrema(:b)
:b_first = :b .- first(ex)
:b_last = :b .- last(ex)
end
3×4 DataFrame
Row │ a b b_first b_last
│ Int64 Int64 Int64 Int64
─────┼───────────────────────────────
1 │ 1 400 0 -200
2 │ 2 500 100 -100
3 │ 3 600 200 0
```


## [Working with column names programmatically with `$`](@id dollar)

DataFramesMeta provides the special syntax `$` for referring to
columns in a data frame via a `Symbol`, string, or column position as either
a literal or a variable.
columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable.

```julia
df = DataFrame(A = 1:3, B = [2, 1, 2])
Expand Down
5 changes: 4 additions & 1 deletion src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ using Reexport

using MacroTools

using OrderedCollections: OrderedCollections

@reexport using DataFrames

@reexport using Chain
Expand All @@ -16,12 +18,13 @@ export @with,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@eachrow, @eachrow!,
@byrow, @passmissing,
@byrow, @passmissing, @astable,
@based_on, @where # deprecated

const DOLLAR = raw"$"

include("parsing.jl")
include("parsing_astable.jl")
include("macros.jl")
include("linqmacro.jl")
include("eachrow.jl")
Expand Down
188 changes: 164 additions & 24 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -282,11 +282,10 @@ macro byrow(args...)
throw(ArgumentError("@byrow is deprecated outside of DataFramesMeta macros."))
end


"""
passmissing(args...)
@passmissing(args...)
Propograte missing values inside DataFramesMeta.jl macros.
Propagate missing values inside DataFramesMeta.jl macros.
`@passmissing` is not a "real" Julia macro but rather serves as a "flag"
Expand Down Expand Up @@ -350,6 +349,156 @@ macro passmissing(args...)
throw(ArgumentError("@passmissing only works inside DataFramesMeta macros."))
end

const astable_docstring_snippet = """
Transformations can also use the macro-flag [`@astable`](@ref) for creating multiple
new columns at once and letting transformations share the same name-space.
See `? @astable` for more details.
"""

"""
@astable(args...)
Return a `NamedTuple` from a single transformation inside the DataFramesMeta.jl
macros, `@select`, `@transform`, and their mutating and row-wise equivalents.
`@astable` acts on a single block. It works through all top-level expressions
and collects all such expressions of the form `:y = ...` or `$(DOLLAR)y = ...`, i.e. assignments to a
`Symbol` or an escaped column identifier, which is a syntax error outside of
DataFramesMeta.jl macros. At the end of the expression, all assignments are collected
into a `NamedTuple` to be used with the `AsTable` destination in the DataFrames.jl
transformation mini-language.
Concretely, the expressions
```
df = DataFrame(a = 1)
@rtransform df @astable begin
:x = 1
y = 50
:z = :x + y + :a
end
```
become the pair
```
function f(a)
x_t = 1
y = 50
z_t = x_t + y + a
(; x = x_t, z = z_t)
end
transform(df, [:a] => ByRow(f) => AsTable)
```
`@astable` has two major advantages at the cost of increasing complexity.
First, `@astable` makes it easy to create multiple columns from a single
transformation, which share a scope. For example, `@astable` allows
for the following (where `:x` and `:x_2` exist in the data frame already).
```
@transform df @astable begin
m = mean(:x)
:x_demeaned = :x .- m
:x2_demeaned = :x2 .- m
end
```
The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`,
which does not need to be calculated twice.
Second, `@astable` is useful when performing intermediate calculations
and storing their results in new columns. For example, the following fails.
```
@rtransform df begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
```
This because DataFrames.jl does not guarantee sequential evaluation of
transformations. `@astable` solves this problem
@rtransform df @astable begin
:new_col_1 = :x + :y
:new_col_2 = :new_col_1 + :z
end
Column assignment in `@astable` follows similar rules as
column assignment in other DataFramesMeta.jl macros. The left-
-hand-side of a column assignment can be either a `Symbol` or any
expression which evaluates to a `Symbol` or `AbstractString`. For example
`:y = ...`, and `$(DOLLAR)y = ...` are both valid ways of assigning a new column.
However unlike other DataFramesMeta.jl macros, multi-column assignments via
`AsTable` are disallowed. The following will fail.
```
@transform df @astable begin
$AsTable = :x
end
```
References to existing columns also follow the same
rules as other DataFramesMeta.jl macros.
### Examples
```
julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]);
julia> d = @rtransform df @astable begin
:x = 1
y = 5
:z = :x + y
end
3×4 DataFrame
Row │ a b x z
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 4 1 6
2 │ 2 5 1 6
3 │ 3 6 1 6
julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]);
julia> @by df :a @astable begin
ex = extrema(:b)
:min_b = first(ex)
:max_b = last(ex)
end
2×3 DataFrame
Row │ a min_b max_b
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 6
2 │ 2 70 80
julia> new_col = "New Column";
julia> @rtransform df @astable begin
f_a = first(:a)
$(DOLLAR)new_col = :a + :b + f_a
:y = :a * :b
end
4×4 DataFrame
Row │ a b New Column y
│ Int64 Int64 Int64 Int64
─────┼─────────────────────────────────
1 │ 1 5 7 5
2 │ 1 6 8 6
3 │ 2 70 74 140
4 │ 2 80 84 160
```
"""
macro astable(args...)
throw(ArgumentError("@astable only works inside DataFramesMeta macros."))
end

##############################################################################
##
## @with
Expand Down Expand Up @@ -1097,6 +1246,8 @@ transformations by row, `@transform` allows `@byrow` at the
beginning of a block of transformations (i.e. `@byrow begin... end`).
All transformations in the block will operate by row.
$astable_docstring_snippet
### Examples
```jldoctest
Expand Down Expand Up @@ -1233,6 +1384,8 @@ transform!ations by row, `@transform!` allows `@byrow` at the
beginning of a block of transform!ations (i.e. `@byrow begin... end`).
All transform!ations in the block will operate by row.
$astable_docstring_snippet
### Examples
```jldoctest
Expand Down Expand Up @@ -1345,6 +1498,8 @@ transformations by row, `@select` allows `@byrow` at the
beginning of a block of selectations (i.e. `@byrow begin... end`).
All transformations in the block will operate by row.
$astable_docstring_snippet
### Examples
```jldoctest
Expand Down Expand Up @@ -1465,6 +1620,8 @@ transformations by row, `@select!` allows `@byrow` at the
beginning of a block of select!ations (i.e. `@byrow begin... end`).
All transformations in the block will operate by row.
$astable_docstring_snippet
### Examples
```jldoctest
Expand Down Expand Up @@ -1546,17 +1703,6 @@ function combine_helper(x, args...; deprecation_warning = false)

exprs, outer_flags = create_args_vector(args...)

fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

quote
Expand Down Expand Up @@ -1592,6 +1738,8 @@ and
@combine(df, :mx = mean(:x), :sx = std(:x))
```
$astable_docstring_snippet
### Examples
```julia
Expand Down Expand Up @@ -1666,16 +1814,6 @@ end
function by_helper(x, what, args...)
# Only allow one argument when returning a Table object
exprs, outer_flags = create_args_vector(args...)
fe = first(exprs)
if length(exprs) == 1 &&
get_column_expr(fe) === nothing &&
!(fe.head == :(=) || fe.head == :kw)

@warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS."

lhs = Expr(:$, :AsTable)
exprs = ((:($lhs = $fe)),)
end

t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs)

Expand Down Expand Up @@ -1718,6 +1856,8 @@ and
@by(df, :g, mx = mean(:x), sx = std(:x))
```
$astable_docstring_snippet
### Examples
```julia
Expand Down
Loading

0 comments on commit cc066df

Please sign in to comment.