Support a "Show plan" endpoint with cost estimation #864

shanson7 · 2018-03-07T19:39:36Z

We are looking to build a user proxy to authenticate, audit, authorize and rate limit access to MT. One thing that would be quite helpful is to have an endpoint that would give details about execution (without executing it). Some points of interest:

How many time-series are involved
If it would proxy to graphite
Number of functions (maybe weighted by complexity of the function)
A "cost" that is some computed scoring of the other 3 pieces of info.

Some of this info needs to reach out to the rest of the cluster to get accurate numbers, but maybe that could be optional and a "best guess" could be used based on the local node.

Dieterbe · 2018-03-08T08:58:07Z

I think the main challenge here is representing true "cost".
but i guess it's a given that the model will be rough and still better than nothing.

also useful I think would be:

estimate of number of points/chunks loaded from cassandra (even though the cache may help a lot)
estimate of any consolidation that needs to be done.

I believe we've discussed previously that this isn't a priority/roadmap item for us right now, so while we're happy to review and assist, primary development would come from you guys.

shalstea · 2018-05-30T12:57:37Z

Also in the result set one should show what data set is used (e.g. raw data or rollup).

Dieterbe · 2019-04-30T15:58:38Z

Some of this info needs to reach out to the rest of the cluster to get accurate numbers, but maybe that could be optional and a "best guess" could be used based on the local node.

right, hard to know the amount of timeseries involved without issuing the actual index query.
seems to me the cardinality is quite essential to the cost.
to be clear, you definitely want a "dry-run" endpoint that just tells you the cost, but doesn't execute the query? not enough to just give additional stats along with actual render responses? Seems it would make all of our lives easier to just track cost of executed queries

shalstea · 2019-04-30T16:07:42Z

I am ok with just tracking executed queries. I don't want to make it overly complicated. Interactions with Cassandra would also be useful because we find that a lot of queries / dashboards have time ranges greater that the cache size.

Dieterbe · 2019-05-01T17:49:22Z

So can we just close this in favor of #1130 then? seems that one includes the "cost metrics" asked for here, solving the problem.

shalstea · 2019-05-01T18:07:19Z

Yes. Let's close this in favor of #1130

saheemg mentioned this issue Jul 12, 2018

Add show plan endpoint #961

Merged

Dieterbe closed this as completed May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a "Show plan" endpoint with cost estimation #864

Support a "Show plan" endpoint with cost estimation #864

shanson7 commented Mar 7, 2018

Dieterbe commented Mar 8, 2018

shalstea commented May 30, 2018

Dieterbe commented Apr 30, 2019 •

edited

Loading

shalstea commented Apr 30, 2019

Dieterbe commented May 1, 2019

shalstea commented May 1, 2019

Support a "Show plan" endpoint with cost estimation #864

Support a "Show plan" endpoint with cost estimation #864

Comments

shanson7 commented Mar 7, 2018

Dieterbe commented Mar 8, 2018

shalstea commented May 30, 2018

Dieterbe commented Apr 30, 2019 • edited Loading

shalstea commented Apr 30, 2019

Dieterbe commented May 1, 2019

shalstea commented May 1, 2019

Dieterbe commented Apr 30, 2019 •

edited

Loading