-
Notifications
You must be signed in to change notification settings - Fork 42
Grouping and aggregation
PR #1135 extends the Links language-integrated query mechanism, making it possible for the user to write database queries that return, or employ as intermediate data structures, finite maps rather than finite relations. Finite maps are used to represent the result of grouping a relation, and aggregation can then be applied groupwise to a finite map to obtain again a relation. Such Links queries are translated to SQL queries using group by
and aggregation.
In the Links type system, the type of finite maps is of the form [(k, [v])]
, where k
and v
are tuple types, respectively for keys and values. For every key x : k
, the finite map contains at most one pair (x, [y1, ..., yn])
associating to the key x
the collection of values [y1, ..., yn]
.
Finite maps, and thus queries with grouping and aggregation, are only available in the mixing normaliser. The standard normaliser is still limited to working with relations.
The main way to construct a finite map is by means of the grouping operator groupBy
groupBy;
fun : ((a) -b-> c, [a]) -b-> [(c, [a])
groupBy(f, l)
takes a list l
and a grouping criterion f
: it applies f
to each item in the list to obtain its grouping key, and finally returns a finite map, i.e. a list of pairs associating to each grouping key the list of items of the original list l
that share the same grouping key.
e.g.
groupBy(fun (x) { (is_even = even(x)) }, [1, 2, 3]) = [((is_even = true), [2]), ((is_even = false), [1,3])]
The input of groupBy
should be a relation rather than a finite map; to allow for iterated grouping, we provide a groupByMap
operation that uses a finite map input:
groupByMap;
fun : ((a) -b-> c, [(d, [a])]) -b-> [((c, d), [a])]
Notice that iterated use of groupByMap
produces nested tuple types as keys: however, Links flattens these tuples transparently when converting them to SQL.
Finite maps extend comprehension syntax in two ways:
- the output of a (standard) comprehension can now be a finite map (however, the input must always be a relation)
for (x <-- table) groupBy(f, l)
- we add a new key comprehension to iterate on the (duplicate-free) set of keys of a finite map: this is presently supported by the function
concatMapKey
:
concatMapKey;
fun : ((a) -b-> [c], [(a, [_])]) -b-> [c]
concatMapKey
works like concatMap
, but it takes as input a finite map rather than any list; it performs comprehension over the (deduplicated) key set of the input map (for this reason, the output type of the finite map argument is irrelevant). The output can be a relation or a finite map.
lookupG;
fun : (a, [(a, [b])]) -> [b]
Given a key k
and a finite map m
, lookupG(k,m)
returns the list of values associated by the map m
to the key k
.
Queries support the following aggregation operators:
sum, sumF, avg, avgF, min_list, minF_list, max_list, maxF_list, length
(where the F
variants accept Float
inputs rather than Int
)
If l : [Int]
, one can for instance use sum(l)
within a databaseable query. However, in the most common case of queries using grouping and aggregation together, the following groupwise aggregation operator is easier to use and typically produces more efficient SQL:
aggBy;
fun : ([(a, [b])], ([b]) -c-> d) -c-> [(a, d)]
aggBy(m, f)
takes a finite map m
and an aggregation criterion f
: its purpose is to apply the aggregation to the input map on a key by key basis.
The implementation here is admittedly hacky: we can only support f
when it is textually in the form:
fun (t) { (outlabel1 = agg1(for (x <- t) [x.inlabel1]), ..., outlabeln = aggn(for (x <- t) [x.inlabeln])) }
Where agg1
, ... aggn
are aggregation functions.
Any other use of aggBy
will NOT be databaseable.
Example queries can be found in the database/grouping.links
test file (which requires tables created by database/grouping-create.links
)