Skip to content

Commit cd7d1c9

Browse files
authored
Add geo_line aggregation (#41612) (#65442)
A metric aggregation that aggregates a set of points as a GeoJSON LineString ordered by some sort parameter. A `geo_line` aggregation request would specify a `geo_point` field, as well as a `sort` field. `geo_point` represents the values used in the LineString, while the `sort` values will be used as the total ordering of the points. the `sort` field would support any numeric field, including date. ``` { "query": { "bool": { "must": [ { "term": { "person": "004" } }, { "term": { "trajectory": "20090131002206.plt" } } ] } }, "aggs": { "make_line": { "geo_line": { "point": {"field": "location"}, "sort": { "field": "timestamp" }, "include_sort": true, "sort_order": "desc", "size": 15 } } } } ``` ``` { "took": 21, "timed_out": false, "_shards": {...}, "hits": {...}, "aggregations": { "make_line": { "type": "LineString", "coordinates": [ [ 121.52926194481552, 38.92878997139633 ], [ 121.52922699227929, 38.92876998055726 ], ] } } } ``` Due to the cardinality of points, an initial max of 10k points will be used. This should support many use-cases. One solution to overcome this limitation is to keep a PriorityQueue of points, and simplifying the line once it hits this max. If simplifying makes sense, it may be a nice option, in general. The ability to use a parameter to specify how aggressive one wants to simplify. This parameter could be the number of points. Example algorithm one could use with a PriorityQueue: https://bost.ocks.org/mike/simplify/. This would still require O(m) space, where m is the number of points returned. And would also require heapifying triangles sorted by their areas, which would be O(log(m)) operations. Since sorting is done, anyways, simplifying would still be a O(n log(m)) operation, where n is the total number of points to filter........... something to explore closes #41649
1 parent dbb22cb commit cd7d1c9

File tree

22 files changed

+1922
-73
lines changed

22 files changed

+1922
-73
lines changed

docs/reference/aggregations/metrics.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ include::metrics/geobounds-aggregation.asciidoc[]
2323

2424
include::metrics/geocentroid-aggregation.asciidoc[]
2525

26+
include::metrics/geoline-aggregation.asciidoc[]
27+
2628
include::metrics/matrix-stats-aggregation.asciidoc[]
2729

2830
include::metrics/max-aggregation.asciidoc[]
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
[role="xpack"]
2+
[testenv="gold"]
3+
[[search-aggregations-metrics-geo-line]]
4+
=== Geo-Line Aggregation
5+
++++
6+
<titleabbrev>Geo-Line</titleabbrev>
7+
++++
8+
9+
The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a LineString ordered
10+
by the chosen `sort` field. This `sort` can be a date field, for example. The bucket returned is a valid
11+
https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] representing the line geometry.
12+
13+
[source,console,id=search-aggregations-metrics-geo-line-simple]
14+
----
15+
PUT test
16+
{
17+
"mappings": {
18+
"dynamic": "strict",
19+
"_source": {
20+
"enabled": false
21+
},
22+
"properties": {
23+
"my_location": {
24+
"type": "geo_point"
25+
},
26+
"group": {
27+
"type": "keyword"
28+
},
29+
"@timestamp": {
30+
"type": "date"
31+
}
32+
}
33+
}
34+
}
35+
36+
POST /test/_bulk?refresh
37+
{"index": {}}
38+
{"my_location": {"lat":37.3450570, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:36"}
39+
{"index": {}}
40+
{"my_location": {"lat": 37.3451320, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:37Z"}
41+
{"index": {}}
42+
{"my_location": {"lat": 37.349283, "lon": -122.0505010}, "@timestamp": "2013-09-06T16:00:37Z"}
43+
44+
POST /test/_search?filter_path=aggregations
45+
{
46+
"aggs": {
47+
"line": {
48+
"geo_line": {
49+
"point": {"field": "my_location"},
50+
"sort": {"field": "@timestamp"}
51+
}
52+
}
53+
}
54+
}
55+
----
56+
57+
Which returns:
58+
59+
[source,js]
60+
----
61+
{
62+
"aggregations": {
63+
"line": {
64+
"type" : "Feature",
65+
"geometry" : {
66+
"type" : "LineString",
67+
"coordinates" : [
68+
[
69+
-122.049982,
70+
37.345057
71+
],
72+
[
73+
-122.050501,
74+
37.349283
75+
],
76+
[
77+
-122.049982,
78+
37.345132
79+
]
80+
]
81+
},
82+
"properties" : {
83+
"complete" : true
84+
}
85+
}
86+
}
87+
}
88+
----
89+
// TESTRESPONSE
90+
91+
[[search-aggregations-metrics-geo-line-options]]
92+
==== Options
93+
94+
`point`::
95+
(Required)
96+
97+
This option specifies the name of the `geo_point` field
98+
99+
Example usage configuring `my_location` as the point field:
100+
101+
[source,js]
102+
----
103+
"point": {
104+
"field": "my_location"
105+
}
106+
----
107+
// NOTCONSOLE
108+
109+
`sort`::
110+
(Required)
111+
112+
This option specifies the name of the numeric field to use as the sort key
113+
for ordering the points
114+
115+
Example usage configuring `@timestamp` as the sort key:
116+
117+
[source,js]
118+
----
119+
"point": {
120+
"field": "@timestamp"
121+
}
122+
----
123+
// NOTCONSOLE
124+
125+
`include_sort`::
126+
(Optional, boolean, default: `false`)
127+
128+
This option includes, when true, an additional array of the sort values in the
129+
feature properties.
130+
131+
`sort_order`::
132+
(Optional, string, default: `"ASC"`)
133+
134+
This option accepts one of two values: "ASC", "DESC".
135+
136+
The line is sorted in ascending order by the sort key when set to "ASC", and in descending
137+
with "DESC".
138+
139+
`size`::
140+
(Optional, integer, default: `10000`)
141+
142+
The maximum length of the line represented in the aggregation. Valid sizes are
143+
between one and 10000.

server/src/main/java/org/elasticsearch/search/sort/BucketedSort.java

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
* worst case. Critically, it is a very fast {@code O(1)} to check if a value
6767
* is competitive at all which, so long as buckets aren't hit in reverse
6868
* order, they mostly won't be. Extracting results in sorted order is still
69-
* {@code O(n * log n)}.
69+
* {@code O(n * log n)}.
7070
* </p>
7171
* <p>
7272
* When we first collect a bucket we make sure that we've allocated enough
@@ -90,7 +90,7 @@ public interface ExtraData {
9090
* <p>
9191
* Both parameters will have previously been loaded by
9292
* {@link Loader#loadFromDoc(long, int)} so the implementer shouldn't
93-
* need to grow the underlying storage to implement this.
93+
* need to grow the underlying storage to implement this.
9494
* </p>
9595
*/
9696
void swap(long lhs, long rhs);
@@ -128,7 +128,7 @@ public Loader loader(LeafReaderContext ctx) throws IOException {
128128
private final SortOrder order;
129129
private final DocValueFormat format;
130130
private final int bucketSize;
131-
private final ExtraData extra;
131+
protected final ExtraData extra;
132132
/**
133133
* {@code true} if the bucket is in heap mode, {@code false} if
134134
* it is still gathering.
@@ -206,9 +206,9 @@ public final List<SortValue> getValues(long bucket) throws IOException {
206206
}
207207

208208
/**
209-
* Is this bucket a min heap {@code true} or in gathering mode {@code false}?
209+
* Is this bucket a min heap {@code true} or in gathering mode {@code false}?
210210
*/
211-
private boolean inHeapMode(long bucket) {
211+
public boolean inHeapMode(long bucket) {
212212
return heapMode.get(bucket);
213213
}
214214

@@ -254,7 +254,7 @@ private boolean inHeapMode(long bucket) {
254254
/**
255255
* {@code true} if the entry at index {@code lhs} is "better" than
256256
* the entry at {@code rhs}. "Better" in this means "lower" for
257-
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
257+
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
258258
*/
259259
protected abstract boolean betterThan(long lhs, long rhs);
260260

@@ -283,7 +283,7 @@ protected final String debugFormat() {
283283

284284
/**
285285
* Initialize the gather offsets after setting up values. Subclasses
286-
* should call this once, after setting up their {@link #values()}.
286+
* should call this once, after setting up their {@link #values()}.
287287
*/
288288
protected final void initGatherOffsets() {
289289
setNextGatherOffsets(0);
@@ -325,12 +325,12 @@ private void setNextGatherOffsets(long startingAt) {
325325
* case.
326326
* </p>
327327
* <ul>
328-
* <li>Hayward, Ryan; McDiarmid, Colin (1991).
328+
* <li>Hayward, Ryan; McDiarmid, Colin (1991).
329329
* <a href="https://web.archive.org/web/20160205023201/http://www.stats.ox.ac.uk/__data/assets/pdf_file/0015/4173/heapbuildjalg.pdf">
330330
* Average Case Analysis of Heap Building byRepeated Insertion</a> J. Algorithms.
331331
* <li>D.E. Knuth, ”The Art of Computer Programming, Vol. 3, Sorting and Searching”</li>
332332
* </ul>
333-
* @param rootIndex the index the start of the bucket
333+
* @param rootIndex the index the start of the bucket
334334
*/
335335
private void heapify(long rootIndex) {
336336
int maxParent = bucketSize / 2 - 1;
@@ -344,7 +344,7 @@ private void heapify(long rootIndex) {
344344
* runs in {@code O(log n)} time.
345345
* @param rootIndex index of the start of the bucket
346346
* @param parent Index within the bucket of the parent to check.
347-
* For example, 0 is the "root".
347+
* For example, 0 is the "root".
348348
*/
349349
private void downHeap(long rootIndex, int parent) {
350350
while (true) {
@@ -443,7 +443,7 @@ public final void collect(int doc, long bucket) throws IOException {
443443
/**
444444
* {@code true} if the sort value for the doc is "better" than the
445445
* entry at {@code index}. "Better" in means is "lower" for
446-
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
446+
* {@link SortOrder#ASC} and "higher" for {@link SortOrder#DESC}.
447447
*/
448448
protected abstract boolean docBetterThan(long index);
449449

@@ -545,7 +545,7 @@ public abstract static class ForFloats extends BucketedSort {
545545
* The maximum size of buckets this can store. This is because we
546546
* store the next offset to write to in a float and floats only have
547547
* {@code 23} bits of mantissa so they can't accurate store values
548-
* higher than {@code 2 ^ 24}.
548+
* higher than {@code 2 ^ 24}.
549549
*/
550550
public static final int MAX_BUCKET_SIZE = (int) Math.pow(2, 24);
551551

x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/topmetrics/TopMetricsAggregator.java

Lines changed: 1 addition & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import org.elasticsearch.search.sort.BucketedSort;
3333
import org.elasticsearch.search.sort.SortBuilder;
3434
import org.elasticsearch.search.sort.SortValue;
35+
import org.elasticsearch.xpack.core.common.search.aggregations.MissingHelper;
3536
import org.elasticsearch.xpack.analytics.topmetrics.InternalTopMetrics.MetricValue;
3637

3738
import java.io.IOException;
@@ -495,62 +496,4 @@ public Loader loader(LeafReaderContext ctx) throws IOException {
495496
public void close() {}
496497
}
497498

498-
/**
499-
* Helps {@link LongMetricValues} track "empty" slots. It attempts to have
500-
* very low CPU overhead and no memory overhead when there *aren't* empty
501-
* values.
502-
*/
503-
private static class MissingHelper implements Releasable {
504-
private final BigArrays bigArrays;
505-
private BitArray tracker;
506-
507-
MissingHelper(BigArrays bigArrays) {
508-
this.bigArrays = bigArrays;
509-
}
510-
511-
void markMissing(long index) {
512-
if (tracker == null) {
513-
tracker = new BitArray(index, bigArrays);
514-
}
515-
tracker.set(index);
516-
}
517-
518-
void markNotMissing(long index) {
519-
if (tracker == null) {
520-
return;
521-
}
522-
tracker.clear(index);
523-
}
524-
525-
void swap(long lhs, long rhs) {
526-
if (tracker == null) {
527-
return;
528-
}
529-
boolean backup = tracker.get(lhs);
530-
if (tracker.get(rhs)) {
531-
tracker.set(lhs);
532-
} else {
533-
tracker.clear(lhs);
534-
}
535-
if (backup) {
536-
tracker.set(rhs);
537-
} else {
538-
tracker.clear(rhs);
539-
}
540-
}
541-
542-
boolean isEmpty(long index) {
543-
if (tracker == null) {
544-
return false;
545-
}
546-
return tracker.get(index);
547-
}
548-
549-
@Override
550-
public void close() {
551-
if (tracker != null) {
552-
tracker.close();
553-
}
554-
}
555-
}
556499
}

x-pack/plugin/core/src/main/java/org/elasticsearch/license/XPackLicenseState.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ public enum Feature {
9898

9999
SPATIAL_GEO_GRID(OperationMode.GOLD, true),
100100

101+
SPATIAL_GEO_LINE(OperationMode.GOLD, true),
102+
101103
ANALYTICS(OperationMode.MISSING, true),
102104

103105
AGGREGATE_METRIC(OperationMode.MISSING, true),

0 commit comments

Comments
 (0)