Skip to content

SQL array handling is inconsistent #33204

@paulcarey

Description

@paulcarey

The following applies to Elasticsearch 6.3.

Selecting array fields is not handled consistently and should be revised.

If a document contains an array of primitive (e.g. string, number), an attempt to project that field will fail with an error like Arrays (returned by [values]) are not supported. This behaviour is unnecessarily restrictive. There may be a misconception over SQL or JDBC array support being non-standard.

Wikipedia states that SQL support for arrays was standardized in 1999. JDBC support for Array has existed since JDK 1.2 (released 1998).

For reference, here's an example of using Arrays of Structs with JDBC and PostgreSQL. The Java tutorial has an example showing JDBC usage of Array of String.

A workaround for the array limitation in ES-SQL exists by wrapping the primitive values in an object containing just that value. When this is done then a select can be used to 'explode' each element in the array into its own row. While useful, this should arguably not happen by default.

This exploding behaviour is also supported in Hive, Spark SQL and some other data stores where it must be explicitly triggered by invoking lateral view explode.

If ES-SQL supported arrays, then:

  • the workaround to wrap primitives in objects would not be needed
  • clients could choose via lateral view explode or similar syntax whether to receive a single row, or a row per array element

The current situation feels a bit ill-defined.

Showing the current limitation

DELETE array_test_2

POST array_test_2/_doc
{
  "name": "foo",
  "values": [1, 2, 3]
}


POST _xpack/sql?format=txt
{
  "query": "select name, values from array_test_2"
}


{
  "error": {
    "root_cause": [
      {
        "type": "sql_illegal_argument_exception",
        "reason": "Arrays (returned by [values]) are not supported"
      }
    ],
    "type": "sql_illegal_argument_exception",
    "reason": "Arrays (returned by [values]) are not supported"
  },
  "status": 500
}

Workaround for the array limitation by wrapping primitives in an object

DELETE array_test_1

PUT array_test_1
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text"
        },
        "values": {
          "type": "nested",
          "properties": {
            "v": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

POST array_test_1/_doc
{
  "name": "foo",
  "values": [{"v": 1}, {"v": 2}, {"v": 3}]
}

POST _xpack/sql?format=txt
{
  "query": "select name, values.v as v from array_test_1"
}


     name      |       v       
---------------+---------------
foo            |3              
foo            |2              
foo            |1              

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions