Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Enhance joining and grouping #17

Merged
merged 33 commits into from
Mar 6, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
dd68a65
merge alyst groupby to DataTables
cjprybol Feb 20, 2017
a5fd472
passing tests
cjprybol Feb 20, 2017
9424201
Merge branch 'master' into cjp/alyst-groupby
cjprybol Feb 20, 2017
a652768
revert seemingly unrelated changes
cjprybol Feb 20, 2017
0cdf755
revert unnecessary changes for variable name and spacing
cjprybol Feb 21, 2017
d292dd3
fix indentation issue
cjprybol Feb 21, 2017
53774f5
add nonunique()
cjprybol Feb 22, 2017
2adc883
commit join.jl merge for Alyst to debug
cjprybol Feb 22, 2017
d52c791
make the easy changes requested during review
cjprybol Feb 22, 2017
5e9664a
add docstrings to row permutation functions
cjprybol Feb 23, 2017
e1b4d0e
clarify error message
cjprybol Feb 23, 2017
74c36d1
remove unused function
cjprybol Feb 23, 2017
de09a5c
update function to use isequal(a::Nullable, b::Nullable) from base
cjprybol Feb 23, 2017
160be5c
frame -> table
cjprybol Feb 23, 2017
7f28a14
update merge based on helpful diff
cjprybol Feb 23, 2017
61bf607
pass all tests that don't use Categorical
cjprybol Feb 23, 2017
6147d0c
added back commented out functions
cjprybol Feb 23, 2017
bab097f
minor cleanup
cjprybol Feb 23, 2017
cdac010
Merge branch 'master' into cjp/alyst-groupby
cjprybol Feb 23, 2017
8cf4a67
more changes suggested during review
cjprybol Feb 23, 2017
199f96b
use explicit vcat, indendation, parentheses
cjprybol Feb 23, 2017
f3b06a3
more indentation
cjprybol Feb 23, 2017
8308879
fix test/join.jl errors using `resize!` in
cjprybol Feb 24, 2017
1c842dc
passing all tests!
cjprybol Feb 25, 2017
637b8cf
update categorical Arrays version
cjprybol Feb 27, 2017
49d6328
incorporate edits suggested during review
cjprybol Mar 1, 2017
b6c1f98
more fixes
cjprybol Mar 3, 2017
839c558
added tests and trimmed unneccessary functions
cjprybol Mar 4, 2017
46aaae2
update new function name
cjprybol Mar 4, 2017
01b3ce8
revert code deletions and address most comments
cjprybol Mar 6, 2017
7b9b8e2
revert bad edit, function is untested
cjprybol Mar 6, 2017
7fe0389
revert some changes, clean up tests
cjprybol Mar 6, 2017
cf0486a
change docstring to comment
cjprybol Mar 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion REQUIRE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
julia 0.5
NullableArrays 0.1.0
CategoricalArrays 0.0.6
CategoricalArrays 0.1.2
StatsBase 0.11.0
GZip
SortingAlgorithms
Expand Down
10 changes: 5 additions & 5 deletions docs/src/man/joins.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Cross joins are the only kind of join that does not use a key:
join(a, b, kind = :cross)
```

In order to join data frames on keys which have different names, you must first rename them so that they match. This can be done using rename!:
In order to join data tables on keys which have different names, you must first rename them so that they match. This can be done using rename!:

```julia
a = DataTable(ID = [1, 2], Name = ["A", "B"])
Expand All @@ -63,11 +63,11 @@ join(a, b, on = :ID, kind = :inner)
Or renaming multiple columns at a time:

```julia
a = DataTable(City = ["Amsterdam", "London", "London", "New York", "New York"],
Job = ["Lawyer", "Lawyer", "Lawyer", "Doctor", "Doctor"],
a = DataTable(City = ["Amsterdam", "London", "London", "New York", "New York"],
Job = ["Lawyer", "Lawyer", "Lawyer", "Doctor", "Doctor"],
Category = [1, 2, 3, 4, 5])
b = DataTable(Location = ["Amsterdam", "London", "London", "New York", "New York"],
Work = ["Lawyer", "Lawyer", "Lawyer", "Doctor", "Doctor"],
b = DataTable(Location = ["Amsterdam", "London", "London", "New York", "New York"],
Work = ["Lawyer", "Lawyer", "Lawyer", "Doctor", "Doctor"],
Name = ["a", "b", "c", "d", "e"])
rename!(b, [:Location => :City, :Work => :Job])
join(a, b, on = [:City, :Job])
Expand Down
1 change: 1 addition & 0 deletions src/DataTables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ for (dir, filename) in [
("subdatatable", "subdatatable.jl"),
("groupeddatatable", "grouping.jl"),
("datatablerow", "datatablerow.jl"),
("datatablerow", "utils.jl"),

("abstractdatatable", "iteration.jl"),
("abstractdatatable", "join.jl"),
Expand Down
17 changes: 7 additions & 10 deletions src/abstractdatatable/abstractdatatable.jl
Original file line number Diff line number Diff line change
Expand Up @@ -602,17 +602,14 @@ nonunique(dt, 1)

"""
function nonunique(dt::AbstractDataTable)
res = fill(false, nrow(dt))
rows = Set{DataTableRow}()
for i in 1:nrow(dt)
arow = DataTableRow(dt, i)
if in(arow, rows)
res[i] = true
else
push!(rows, arow)
end
gslots = row_group_slots(dt)[3]
# unique rows are the first encountered group representatives,
# nonunique are everything else
res = fill(true, nrow(dt))
@inbounds for g_row in gslots
(g_row > 0) && (res[g_row] = false)
end
res
return res
end

nonunique(dt::AbstractDataTable, cols::Union{Real, Symbol}) = nonunique(dt[[cols]])
Expand Down
Loading