Skip to content

Support query pass-through for Elasticsearch connector#12324

Merged
kasiafi merged 1 commit intotrinodb:masterfrom
kasiafi:343ElasticsearchRemoteQuery
Jul 1, 2022
Merged

Support query pass-through for Elasticsearch connector#12324
kasiafi merged 1 commit intotrinodb:masterfrom
kasiafi:343ElasticsearchRemoteQuery

Conversation

@kasiafi
Copy link
Copy Markdown
Member

@kasiafi kasiafi commented May 10, 2022

This PR introduces raw_query Polymorphic Table Function for full query pass-through to Elasticsearch connector.

# Elasticsearch
* Add `raw_query` table function for full query pass-through to the connector.

Documentation

Documentation issue is filed: #13007.
When documenting, also deprecate the legacy query pass-through mechanism: https://trino.io/docs/current/connector/elasticsearch.html?highlight=elasticsearch#pass-through-queries

Copy link
Copy Markdown
Member

@ebyhr ebyhr May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to verify the result in my opinion and how about formatting like this?

        assertThat(
                "SELECT * FROM " +
                        "TABLE(\"" + getSession().getCatalog().orElseThrow() + "\".\"system\".\"remote_query\"(" +
                        " \"schema\" => 'tpch'," +
                        " \"index\" => 'nation'," +
                        " \"query\" => '{\"query\": {\"match\": {\"name\": \"ALGERIA\"}}}'" +
                        "))")
                .matches(...);

By the way, I expected the above SELECT query returns the result with multiple columns, but didn't. Is it intentional design? Also, I would recommend adding empt result & failure case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional. Pass-through queries in Elasticsearch return a single VARCHAR column as a result.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr I added more tests including no match and failure.

Copy link
Copy Markdown
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should deprecate the old-style passthrough queries in favor of this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional. Pass-through queries in Elasticsearch return a single VARCHAR column as a result.

@kasiafi kasiafi force-pushed the 343ElasticsearchRemoteQuery branch 3 times, most recently from 1eb57db to 4272b0a Compare June 23, 2022 14:04
Copy link
Copy Markdown
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't forget to update the docs or file an issue to do it later so we don't lose track of it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryFunctionHandle maybe?

Comment on lines 1913 to 1925
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is such a good idea. It masks the underlying assertion, so it's hard to tell what was different and it what way. It also doesn't scale when there are more than a couple of alternatives.

Instead, modify the query to normalize the results in some way. For instance, parse the resulting json, extract the values and sort them to get a consistent result.

@kasiafi
Copy link
Copy Markdown
Member Author

kasiafi commented Jun 28, 2022

@martint I applied comments. Please take a look.

Comment on lines 1869 to 1888
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to the fact that the function seems to dump ES result, i would consider calling a raw_query
(especially so that we keep query name free for future)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. In ES parlance, queries are "search queries" (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html), and there's a Query DSL (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html) to formulate them (what we support here). They also have two other languages:

@kasiafi kasiafi force-pushed the 343ElasticsearchRemoteQuery branch from bb850c7 to 85721e4 Compare June 29, 2022 10:57
@kasiafi
Copy link
Copy Markdown
Member Author

kasiafi commented Jun 29, 2022

@martint @findepi I renamed the function to raw_query.

I also added a test to compare the results between the legacy pass-through mechanism, and the new table function.

The old query pass-through will be deprecated in the documentation when we document the table function. @martint is there any other way we want to deprecate it than a note in the docs?

@findepi
Copy link
Copy Markdown
Member

findepi commented Jun 29, 2022

is there any other way we want to deprecate it than a note in the docs?

you can also mark related pieces of code as @Deprecated, with a TODO, and a link to an issue covering removal of that functionality (assuming we don't have plans to carry it forward for ever)

@kasiafi kasiafi force-pushed the 343ElasticsearchRemoteQuery branch from 85721e4 to 1cd499d Compare June 30, 2022 12:57
@kasiafi kasiafi merged commit f427375 into trinodb:master Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants