Skip to content

[FEATURE] Add new data type for text #1038

@Yury-Fridlyand

Description

@Yury-Fridlyand

Is your feature request related to a problem?

SQL plugin doesn't distinguish between text and keyword data types. OpenSearch supports aggregation on keywords and texts with fielddata and/or fields.

It is possible to aggregate on keyword or text (conditions apply)

opensearchsql> select sum(int0) from calcs GROUP BY str0;
fetched rows / total rows = 3/3
+-------------+
| sum(int0)   |
|-------------|
| 1           |
| 18          |
| 49          |
+-------------+

But impossible to aggregate on general text:

opensearchsql> select gender, count(firstname) from bank-with-null-values group by gender;
TransportError(500, 'SearchPhaseExecutionException', {'error': {'type': 'SearchPhaseExecutionException', 'reason': 'Error occurred in OpenSearch engine: all shards failed', 'details': 'Shard[0]: java.lang.IllegalArgumentException: Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [gender] in order to load field data by uninverting the inverted index. Note that this can use significant memory.\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.'}, 'status': 503})

Existing mapping

JDBC type ExprCoreType OpenSearchDataType OpenSearch type
VARCHAR STRING OPENSEARCH_TEXT_KEYWORD keyword
VARCHAR STRING OPENSEARCH_TEXT text

See OpenSearch mapping samples available for aggregation:


"Origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

"firstname": {
"type": "text",
"fielddata": true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Not available for aggregation:

What solution would you like?

Have 2 different data types which are mapped to different JDBC/ODBC types.

JDBC type ExprCoreType OpenSearchDataType OpenSearch type
VARCHAR/CHAR STRING OPENSEARCH_KEYWORD keyword
text with fielddata
text with fields
LONGVARCHAR/TEXT TEXT OPENSEARCH_TEXT text without fielddata and fields

What alternatives have you considered?

N/A

Do you have any additional context?

Opened on behalf of @kylepbit
Ref:

  1. KEYWORD(JDBCType.VARCHAR, String.class, 256, 0, false),
    TEXT(JDBCType.VARCHAR, String.class, Integer.MAX_VALUE, 0, false),
    STRING(JDBCType.VARCHAR, String.class, Integer.MAX_VALUE, 0, false),
  2. /**
    * OpenSearch Text. Rather than cast text to other types (STRING), leave it alone to prevent
    * cast_to_string(OPENSEARCH_TEXT).
    * Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html
    */
    OPENSEARCH_TEXT(Collections.singletonList(STRING), "string") {
    @Override
    public boolean shouldCast(ExprType other) {
    return false;
    }
    },
    /**
    * OpenSearch multi-fields which has text and keyword.
    * Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
    */
    OPENSEARCH_TEXT_KEYWORD(Arrays.asList(STRING, OPENSEARCH_TEXT), "string") {
    @Override
    public boolean shouldCast(ExprType other) {
    return false;
    }
    },
  3. /**
    * String.
    */
    STRING(UNDEFINED),

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions