-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Is your feature request related to a problem?
SQL plugin doesn't distinguish between text and keyword data types. OpenSearch supports aggregation on keywords and texts with fielddata and/or fields.
It is possible to aggregate on keyword or text (conditions apply)
opensearchsql> select sum(int0) from calcs GROUP BY str0;
fetched rows / total rows = 3/3
+-------------+
| sum(int0) |
|-------------|
| 1 |
| 18 |
| 49 |
+-------------+
But impossible to aggregate on general text:
opensearchsql> select gender, count(firstname) from bank-with-null-values group by gender;
TransportError(500, 'SearchPhaseExecutionException', {'error': {'type': 'SearchPhaseExecutionException', 'reason': 'Error occurred in OpenSearch engine: all shards failed', 'details': 'Shard[0]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [gender] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.'}, 'status': 503})
Existing mapping
| JDBC type | ExprCoreType |
OpenSearchDataType |
OpenSearch type |
|---|---|---|---|
VARCHAR |
STRING |
OPENSEARCH_TEXT_KEYWORD |
keyword |
VARCHAR |
STRING |
OPENSEARCH_TEXT |
text |
See OpenSearch mapping samples available for aggregation:
sql/integ-test/src/test/resources/correctness/opensearch_dashboards_sample_data_flights.json
Lines 25 to 27 in b56edc7
| "DestCityName": { | |
| "type": "keyword" | |
| }, |
sql/integ-test/src/test/resources/correctness/opensearch_dashboards_sample_data_flights.json
Lines 61 to 69 in b56edc7
| "Origin": { | |
| "type": "text", | |
| "fields": { | |
| "keyword": { | |
| "type": "keyword", | |
| "ignore_above": 256 | |
| } | |
| } | |
| }, |
sql/integ-test/src/test/resources/indexDefinitions/account_index_mapping.json
Lines 12 to 21 in b56edc7
| "firstname": { | |
| "type": "text", | |
| "fielddata": true, | |
| "fields": { | |
| "keyword": { | |
| "type": "keyword", | |
| "ignore_above": 256 | |
| } | |
| } | |
| }, |
Not available for aggregation:
sql/integ-test/src/test/resources/indexDefinitions/bank_with_null_values_index_mapping.json
Lines 16 to 18 in b56edc7
| "gender": { | |
| "type": "text" | |
| }, |
What solution would you like?
Have 2 different data types which are mapped to different JDBC/ODBC types.
| JDBC type | ExprCoreType |
OpenSearchDataType |
OpenSearch type |
|---|---|---|---|
VARCHAR/CHAR |
STRING |
OPENSEARCH_KEYWORD |
keyword text with fielddatatext with fields |
LONGVARCHAR/TEXT |
TEXT |
OPENSEARCH_TEXT |
text without fielddata and fields |
What alternatives have you considered?
N/A
Do you have any additional context?
Opened on behalf of @kylepbit
Ref:
KEYWORD(JDBCType.VARCHAR, String.class, 256, 0, false), TEXT(JDBCType.VARCHAR, String.class, Integer.MAX_VALUE, 0, false), STRING(JDBCType.VARCHAR, String.class, Integer.MAX_VALUE, 0, false), sql/opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Lines 25 to 46 in b56edc7
/** * OpenSearch Text. Rather than cast text to other types (STRING), leave it alone to prevent * cast_to_string(OPENSEARCH_TEXT). * Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html */ OPENSEARCH_TEXT(Collections.singletonList(STRING), "string") { @Override public boolean shouldCast(ExprType other) { return false; } }, /** * OpenSearch multi-fields which has text and keyword. * Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html */ OPENSEARCH_TEXT_KEYWORD(Arrays.asList(STRING, OPENSEARCH_TEXT), "string") { @Override public boolean shouldCast(ExprType other) { return false; } }, sql/core/src/main/java/org/opensearch/sql/data/type/ExprCoreType.java
Lines 44 to 47 in b56edc7
/** * String. */ STRING(UNDEFINED),