Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Support a "Show plan" endpoint with cost estimation #864

Closed
shanson7 opened this issue Mar 7, 2018 · 6 comments
Closed

Support a "Show plan" endpoint with cost estimation #864

shanson7 opened this issue Mar 7, 2018 · 6 comments

Comments

@shanson7
Copy link
Collaborator

shanson7 commented Mar 7, 2018

We are looking to build a user proxy to authenticate, audit, authorize and rate limit access to MT. One thing that would be quite helpful is to have an endpoint that would give details about execution (without executing it). Some points of interest:

  1. How many time-series are involved
  2. If it would proxy to graphite
  3. Number of functions (maybe weighted by complexity of the function)
  4. A "cost" that is some computed scoring of the other 3 pieces of info.

Some of this info needs to reach out to the rest of the cluster to get accurate numbers, but maybe that could be optional and a "best guess" could be used based on the local node.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 8, 2018

I think the main challenge here is representing true "cost".
but i guess it's a given that the model will be rough and still better than nothing.

also useful I think would be:

  • estimate of number of points/chunks loaded from cassandra (even though the cache may help a lot)
  • estimate of any consolidation that needs to be done.

I believe we've discussed previously that this isn't a priority/roadmap item for us right now, so while we're happy to review and assist, primary development would come from you guys.

@shalstea
Copy link

Also in the result set one should show what data set is used (e.g. raw data or rollup).

@Dieterbe
Copy link
Contributor

Dieterbe commented Apr 30, 2019

Some of this info needs to reach out to the rest of the cluster to get accurate numbers, but maybe that could be optional and a "best guess" could be used based on the local node.

right, hard to know the amount of timeseries involved without issuing the actual index query.
seems to me the cardinality is quite essential to the cost.
to be clear, you definitely want a "dry-run" endpoint that just tells you the cost, but doesn't execute the query? not enough to just give additional stats along with actual render responses? Seems it would make all of our lives easier to just track cost of executed queries

@shalstea
Copy link

I am ok with just tracking executed queries. I don't want to make it overly complicated. Interactions with Cassandra would also be useful because we find that a lot of queries / dashboards have time ranges greater that the cache size.

@Dieterbe
Copy link
Contributor

Dieterbe commented May 1, 2019

So can we just close this in favor of #1130 then? seems that one includes the "cost metrics" asked for here, solving the problem.

@shalstea
Copy link

shalstea commented May 1, 2019

Yes. Let's close this in favor of #1130

@Dieterbe Dieterbe closed this as completed May 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants