Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Commit

Permalink
Support NULLS FIRST/LAST in new engine (#843)
Browse files Browse the repository at this point in the history
* Change grammar

* Add ast building and UT

* Pass jacoco

* Handle nullfirst option in analyzer

* Add comparison test

* Fix broken UT

* Add comparison test with null

* Add more comparison test

* Add doctest

* Update readme with features only avaiable in new engine for clarity

* Prepare PR

* Update doc

* Update doc

* Update doc

* Update doc
  • Loading branch information
dai-chen authored Nov 23, 2020
1 parent 4ebe901 commit 023f65a
Show file tree
Hide file tree
Showing 21 changed files with 368 additions and 47 deletions.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,20 @@ Please refer to the [SQL Language Reference Manual](./docs/user/index.rst), [Pip

Recently we have been actively improving our query engine primarily for better correctness and extensibility. The new enhanced query engine has been already supporting the new released Piped Processing Language query processing behind the scene. Meanwhile, the integration with SQL language is also under way. To try out the power of the new query engine with SQL, simply run the command to enable it by [plugin setting](https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/user/admin/settings.rst#opendistro-sql-engine-new-enabled). In future release, this will be enabled by default and nothing required to do from your side. Please stay tuned for updates on our progress and its new exciting features.

Here is a documentation list with features only available in this improved SQL query engine. Please follow the instruction above to enable it before trying out example queries in these docs:

* [Identifiers](./docs/user/general/identifiers.rst): support for identifier names with special characters
* [Data types](./docs/user/general/datatypes.rst): new data types such as date time and interval
* [Expressions](./docs/user/dql/expressions.rst): new expression system that can represent and evaluate complex expressions
* [SQL functions](./docs/user/dql/functions.rst): many more string and date functions added
* [Basic queries](./docs/user/dql/basics.rst)
* Ordering by Aggregate Functions section
* NULLS FIRST/LAST in section Specifying Order for Null
* [Aggregations](./docs/user/dql/aggregations.rst): aggregation over expression and more other features
* [Complex queries](./docs/user/dql/complex.rst)
* Improvement on Subqueries in FROM clause
* [Window functions](./docs/user/dql/window.rst): ranking window function support


## Setup

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@

package com.amazon.opendistroforelasticsearch.sql.analysis;

import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.NullOrder.NULL_FIRST;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.NullOrder.NULL_LAST;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.SortOrder.ASC;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.SortOrder.DESC;
import static com.amazon.opendistroforelasticsearch.sql.data.type.ExprCoreType.STRUCT;

import com.amazon.opendistroforelasticsearch.sql.analysis.symbol.Namespace;
Expand Down Expand Up @@ -71,6 +75,7 @@
import com.google.common.collect.ImmutableSet;
import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
Expand Down Expand Up @@ -316,12 +321,9 @@ public LogicalPlan visitSort(Sort node, AnalysisContext context) {
node.getSortList().stream()
.map(
sortField -> {
// the first options is {"asc": "true/false"}
Boolean asc = (Boolean) sortField.getFieldArgs().get(0).getValue().getValue();
Expression expression = optimizer.optimize(
expressionAnalyzer.analyze(sortField.getField(), context), context);
return ImmutablePair.of(
asc ? SortOption.DEFAULT_ASC : SortOption.DEFAULT_DESC, expression);
return ImmutablePair.of(analyzeSortOption(sortField.getFieldArgs()), expression);
})
.collect(Collectors.toList());

Expand Down Expand Up @@ -379,4 +381,20 @@ public LogicalPlan visitValues(Values node, AnalysisContext context) {
return new LogicalValues(valueExprs);
}

/**
* The first argument is always "asc", others are optional.
* Given nullFirst argument, use its value. Otherwise just use DEFAULT_ASC/DESC.
*/
private SortOption analyzeSortOption(List<Argument> fieldArgs) {
Boolean asc = (Boolean) fieldArgs.get(0).getValue().getValue();
Optional<Argument> nullFirst = fieldArgs.stream()
.filter(option -> "nullFirst".equals(option.getArgName())).findFirst();

if (nullFirst.isPresent()) {
Boolean isNullFirst = (Boolean) nullFirst.get().getValue().getValue();
return new SortOption((asc ? ASC : DESC), (isNullFirst ? NULL_FIRST : NULL_LAST));
}
return asc ? SortOption.DEFAULT_ASC : SortOption.DEFAULT_DESC;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,10 @@
import static com.amazon.opendistroforelasticsearch.sql.ast.dsl.AstDSL.relation;
import static com.amazon.opendistroforelasticsearch.sql.ast.dsl.AstDSL.unresolvedArg;
import static com.amazon.opendistroforelasticsearch.sql.ast.dsl.AstDSL.unresolvedArgList;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.NullOrder;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.SortOption;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.SortOption.DEFAULT_ASC;
import static com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort.SortOrder;
import static com.amazon.opendistroforelasticsearch.sql.data.model.ExprValueUtils.integerValue;
import static com.amazon.opendistroforelasticsearch.sql.data.type.ExprCoreType.DOUBLE;
import static com.amazon.opendistroforelasticsearch.sql.data.type.ExprCoreType.INTEGER;
Expand All @@ -39,13 +42,12 @@
import static org.junit.jupiter.api.Assertions.assertThrows;

import com.amazon.opendistroforelasticsearch.sql.ast.dsl.AstDSL;
import com.amazon.opendistroforelasticsearch.sql.ast.expression.Argument;
import com.amazon.opendistroforelasticsearch.sql.ast.tree.RareTopN.CommandType;
import com.amazon.opendistroforelasticsearch.sql.ast.tree.Sort;
import com.amazon.opendistroforelasticsearch.sql.exception.SemanticCheckException;
import com.amazon.opendistroforelasticsearch.sql.expression.DSL;
import com.amazon.opendistroforelasticsearch.sql.expression.config.ExpressionConfig;
import com.amazon.opendistroforelasticsearch.sql.expression.window.WindowDefinition;
import com.amazon.opendistroforelasticsearch.sql.planner.logical.LogicalPlan;
import com.amazon.opendistroforelasticsearch.sql.planner.logical.LogicalPlanDSL;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
Expand Down Expand Up @@ -298,7 +300,7 @@ public void sort_with_aggregator() {
ImmutableList.of(DSL.named("string_value", DSL.ref("string_value", STRING)))),
0,
// Aggregator in Sort AST node is replaced with reference by expression optimizer
Pair.of(Sort.SortOption.DEFAULT_ASC, DSL.ref("avg(integer_value)", DOUBLE))),
Pair.of(SortOption.DEFAULT_ASC, DSL.ref("avg(integer_value)", DOUBLE))),
DSL.named("string_value", DSL.ref("string_value", STRING))),
AstDSL.project(
AstDSL.sort(
Expand All @@ -319,6 +321,49 @@ public void sort_with_aggregator() {
AstDSL.alias("string_value", qualifiedName("string_value"))));
}

@SuppressWarnings("unchecked")
@Test
public void sort_with_options() {
ImmutableMap<Argument[], SortOption> argOptions =
ImmutableMap.<Argument[], SortOption>builder()
.put(new Argument[]{argument("asc", booleanLiteral(true))},
new SortOption(SortOrder.ASC, NullOrder.NULL_FIRST))
.put(new Argument[]{argument("asc", booleanLiteral(false))},
new SortOption(SortOrder.DESC, NullOrder.NULL_LAST))
.put(new Argument[]{
argument("asc", booleanLiteral(true)),
argument("nullFirst", booleanLiteral(true))},
new SortOption(SortOrder.ASC, NullOrder.NULL_FIRST))
.put(new Argument[]{
argument("asc", booleanLiteral(true)),
argument("nullFirst", booleanLiteral(false))},
new SortOption(SortOrder.ASC, NullOrder.NULL_LAST))
.put(new Argument[]{
argument("asc", booleanLiteral(false)),
argument("nullFirst", booleanLiteral(true))},
new SortOption(SortOrder.DESC, NullOrder.NULL_FIRST))
.put(new Argument[]{
argument("asc", booleanLiteral(false)),
argument("nullFirst", booleanLiteral(false))},
new SortOption(SortOrder.DESC, NullOrder.NULL_LAST))
.build();

argOptions.forEach((args, expectOption) ->
assertAnalyzeEqual(
LogicalPlanDSL.project(
LogicalPlanDSL.sort(
LogicalPlanDSL.relation("test"),
0,
Pair.of(expectOption, DSL.ref("integer_value", INTEGER))),
DSL.named("string_value", DSL.ref("string_value", STRING))),
AstDSL.project(
AstDSL.sort(
AstDSL.relation("test"),
ImmutableList.of(argument("count", intLiteral(0))),
field(qualifiedName("integer_value"), args)),
AstDSL.alias("string_value", qualifiedName("string_value")))));
}

@SuppressWarnings("unchecked")
@Test
public void window_function() {
Expand Down
38 changes: 37 additions & 1 deletion docs/user/dql/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The syntax of ``SELECT`` statement is as follows::
[WHERE predicates]
[GROUP BY expression [, ...]
[HAVING predicates]]
[ORDER BY expression [IS [NOT] NULL] [ASC | DESC] [, ...]]
[ORDER BY expression [ASC | DESC] [NULLS {FIRST | LAST}] [, ...]]
[LIMIT [offset, ] size]

Although multiple query statements to execute in batch is not supported, ending with semicolon ``;`` is still allowed. For example, you can run ``SELECT * FROM accounts;`` without issue. This is useful to support queries generated by other tool, such as Microsoft Excel or BI tool.
Expand Down Expand Up @@ -929,6 +929,42 @@ Result set:
| Quility|
+--------+

Note that the example above is essentially sorting on a predicate expression. In this case, nulls are put first because it's evaluated to false (0), though all the rest are evaluated to true and still in random order. If you want to specify order for both nulls and non-nulls, ``NULLS FIRST`` or ``NULLS LAST`` in SQL standard can help. Basically, it allows you to specify an independent order for nulls along with ``ASC`` or ``DESC`` keyword::

od> SELECT employer FROM accounts ORDER BY employer ASC NULLS LAST;
fetched rows / total rows = 4/4
+------------+
| employer |
|------------|
| Netagy |
| Pyrami |
| Quility |
| null |
+------------+

The sorting rule can be summarized as follows:

- Without ``NULLS`` clause

- ``ASC``: sort non-nulls in ascending order and put nulls first
- ``DESC``: sort non-nulls in descending order and put nulls last

- With ``NULLS`` clause: just use the nulls order given

Here is another example for sort in descending order without ``NULLS`` clause::

od> SELECT employer FROM accounts ORDER BY employer DESC;
fetched rows / total rows = 4/4
+------------+
| employer |
|------------|
| Quility |
| Pyrami |
| Netagy |
| null |
+------------+


Example 3: Ordering by Aggregate Functions
------------------------------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,9 @@ private String buildBulkBody(String[] columnNames, List<Object[]> batch) {
for (Object[] fieldValues : batch) {
JSONObject json = new JSONObject();
for (int i = 0; i < columnNames.length; i++) {
json.put(columnNames[i], fieldValues[i]);
if (fieldValues[i] != null) {
json.put(columnNames[i], fieldValues[i]);
}
}

body.append("{\"index\":{}}\n").
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,12 +158,21 @@ private String parseColumnNameAndTypesInSchemaJson(String schema) {

private String getValueList(Object[] fieldValues) {
return Arrays.stream(fieldValues).
map(String::valueOf).
map(val -> val.replace(SINGLE_QUOTE, DOUBLE_QUOTE)).
map(val -> SINGLE_QUOTE + val + SINGLE_QUOTE).
map(this::convertValueObjectToString).
collect(joining(","));
}

private String convertValueObjectToString(Object value) {
if (value == null) {
return "NULL";
}

String str = String.valueOf(value);
str = str.replace(SINGLE_QUOTE, DOUBLE_QUOTE);
str = SINGLE_QUOTE + str + SINGLE_QUOTE;
return str;
}

private void populateMetaData(ResultSet resultSet, DBResult result) throws SQLException {
ResultSetMetaData metaData = resultSet.getMetaData();
for (int i = 1; i <= metaData.getColumnCount(); i++) {
Expand All @@ -181,7 +190,8 @@ private void populateData(ResultSet resultSet, DBResult result) throws SQLExcept
while (resultSet.next()) {
Row row = new Row();
for (int i = 1; i <= result.columnSize(); i++) {
row.add(resultSet.getObject(i));
Object value = resultSet.getObject(i);
row.add(resultSet.wasNull() ? null : value);
}
result.addRow(row);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,23 @@ public void testInsertData() throws IOException {
);
}

@Test
public void testInsertNullData() throws IOException {
conn.insert("test", new String[] {"name", "age"},
Arrays.asList(new Object[] {null, 30}, new Object[] {"Hank", null}));

Request actual = captureActualArg();
assertEquals("POST", actual.getMethod());
assertEquals("/test/_bulk?refresh=true", actual.getEndpoint());
assertEquals(
"{\"index\":{}}\n"
+ "{\"age\":30}\n"
+ "{\"index\":{}}\n"
+ "{\"name\":\"Hank\"}\n",
getBody(actual)
);
}

@Test
public void testDropTable() throws IOException {
conn.drop("test");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,27 @@ public void testInsertData() throws SQLException {
);
}

@Test
public void testInsertNullData() throws SQLException {
conn.insert("test", new String[] {"name", "age"},
Arrays.asList(
new Object[] {"John", null},
new Object[] {null, 25},
new Object[] {"Hank", 30}));

ArgumentCaptor<String> argCap = ArgumentCaptor.forClass(String.class);
verify(statement, times(3)).addBatch(argCap.capture());
List<String> actual = argCap.getAllValues();

assertEquals(
Arrays.asList(
"INSERT INTO test(name,age) VALUES ('John',NULL)",
"INSERT INTO test(name,age) VALUES (NULL,'25')",
"INSERT INTO test(name,age) VALUES ('Hank','30')"
), actual
);
}

@Test
public void testSelectQuery() throws SQLException {
ResultSetMetaData metaData = mockMetaData(ImmutableMap.of("name", "VARCHAR", "age", "INT"));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,4 +108,33 @@ public void testDataSetWithEscapedComma() {
);
}

@Test
public void testDataSetWithNullData() {
String mappings =
"{\n"
+ " \"mappings\": {\n"
+ " \"properties\": {\n"
+ " \"field1\": {\n"
+ " \"type\": \"text\"\n"
+ " },\n"
+ " \"field2\": {\n"
+ " \"type\": \"integer\"\n"
+ " }\n"
+ " }\n"
+ " }\n"
+ "}";

TestDataSet dataSet = new TestDataSet("test", mappings,
"field1,field2\n,123\nworld,\n,");
assertThat(
dataSet.getDataRows(),
contains(
new Object[] {"field1", "field2"},
new Object[] {null, 123},
new Object[] {"world", null},
new Object[] {null, null}
)
);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ private Object[] convertStringArrayToObjectArray(JSONObject types, String[] colu
}

private Object convertStringToObject(String type, String str) {
if (str.isEmpty()) {
return null;
}

switch (type.toLowerCase()) {
case "text":
case "keyword":
Expand Down
1 change: 0 additions & 1 deletion integ-test/src/test/resources/correctness/bugfixes/277.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ SELECT COUNT(FlightNum) FROM kibana_sample_data_flights GROUP BY FlightDelay ORD
SELECT COUNT(FlightNum) AS cnt FROM kibana_sample_data_flights GROUP BY FlightDelay ORDER BY cnt
SELECT COUNT(FlightNum) FROM kibana_sample_data_flights GROUP BY FlightDelay ORDER BY 1
SELECT OriginWeather, AVG(FlightTimeMin) FROM kibana_sample_data_flights GROUP BY OriginWeather ORDER BY AVG(FlightTimeMin)
SELECT OriginWeather, AVG(FlightTimeMin) FROM kibana_sample_data_flights GROUP BY OriginWeather ORDER BY SUM(FlightDelayMin)
SELECT OriginWeather, AVG(FlightTimeMin), SUM(FlightDelayMin) FROM kibana_sample_data_flights GROUP BY OriginWeather ORDER BY AVG(FlightTimeMin), SUM(FlightDelayMin)
SELECT OriginWeather, AVG(FlightTimeMin), SUM(FlightDelayMin) AS s FROM kibana_sample_data_flights GROUP BY OriginWeather ORDER BY AVG(FlightTimeMin), s
SELECT OriginWeather, AVG(FlightTimeMin) AS a, SUM(FlightDelayMin) FROM kibana_sample_data_flights GROUP BY OriginWeather ORDER BY a, SUM(FlightDelayMin)
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ HVWAL6J,Comodoro Arturo Merino Benitez International Airport,false,7292.72928960
7ORM12S,Leonardo da Vinci___Fiumicino Airport,false,160.39074208529965,23.46580713004768,Sunny,0,118.37483602607261,Kibana Airlines,0,IT-62,No Delay,PI05,Pisa International Airport,0.39109678550079463,false,258.1238784305245,Rome,Sunny,IT,IT,IT-52,RM11,Pisa,2019-12-23 03:54:12
2P36OEP,New Chitose Airport,true,5340.290617241973,941.1970552595557,Cloudy,0,705.7149863531135,Kibana Airlines,225,SE-BD,Late Aircraft Delay,VIE,Vienna International Airport,15.686617587659262,false,8594.364663114668,Chitose / Tomakomai,Rain,JP,AT,AT-9,CTS,Vienna,2019-12-23 09:41:52
HLNZHCX,Verona Villafranca Airport,false,0,0,Sunny,0,172.3790782673846,ES-Air,0,IT-34,No Delay,VR10,Verona Villafranca Airport,0,false,0,Verona,Sunny,IT,IT,IT-34,VR10,Verona,2019-12-23 19:34:51
HLNNULL,Verona Villafranca Airport,,,0,,0,172.3790782673846,ES-Air,0,IT-34,No Delay,VR10,Verona Villafranca Airport,0,false,0,Verona,Sunny,IT,IT,IT-34,VR10,Verona,
12 changes: 12 additions & 0 deletions integ-test/src/test/resources/correctness/queries/orderby.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,15 @@ SELECT FlightDelay, OriginWeather FROM kibana_sample_data_flights GROUP BY Fligh
SELECT FlightDelay, OriginWeather FROM kibana_sample_data_flights GROUP BY FlightDelay, OriginWeather ORDER BY FlightDelay DESC, OriginWeather
SELECT FlightDelay, OriginWeather FROM kibana_sample_data_flights GROUP BY FlightDelay, OriginWeather ORDER BY FlightDelay, OriginWeather DESC
SELECT FlightDelay, OriginWeather FROM kibana_sample_data_flights GROUP BY FlightDelay, OriginWeather ORDER BY FlightDelay DESC, OriginWeather DESC
SELECT FlightNum, DistanceMiles FROM kibana_sample_data_flights ORDER BY DistanceMiles NULLS FIRST
SELECT FlightNum, DistanceMiles FROM kibana_sample_data_flights ORDER BY DistanceMiles NULLS LAST
SELECT FlightNum, DistanceMiles FROM kibana_sample_data_flights ORDER BY DistanceMiles ASC NULLS LAST
SELECT FlightNum, DistanceMiles FROM kibana_sample_data_flights ORDER BY DistanceMiles DESC NULLS FIRST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles, FlightDelay NULLS FIRST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles, FlightDelay NULLS LAST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles, FlightDelay ASC NULLS LAST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles, FlightDelay DESC NULLS FIRST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles NULLS LAST, FlightDelay NULLS FIRST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles NULLS FIRST, FlightDelay NULLS LAST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles DESC NULLS FIRST, FlightDelay ASC NULLS LAST
SELECT FlightNum, DistanceMiles, FlightDelay FROM kibana_sample_data_flights ORDER BY DistanceMiles ASC NULLS LAST, FlightDelay DESC NULLS FIRST
Loading

0 comments on commit 023f65a

Please sign in to comment.