Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Support external query layer #989

Closed
shanson7 opened this issue Aug 10, 2018 · 9 comments · Fixed by #1243
Closed

Support external query layer #989

shanson7 opened this issue Aug 10, 2018 · 9 comments · Fixed by #1243
Assignees
Milestone

Comments

@shanson7
Copy link
Collaborator

With Metrictanks distributed design, the benefit from handling requests locally diminishes as the number of shard groups grows. With 100 shard groups, a single instance only handles ~1% of the data.

However, it has the overhead of needing to accumulate all of the data into a single node and execute the functions. Stack this onto the load of ingesting data and bookkeeping and a bad query can contribute to issues with ingestion.

Pros:

  1. Reduced load on ingest nodes
  2. Updates can be segregated (a new native function means only the query nodes need a restart)
  3. Query nodes can be restarted very quickly (in the case of egregious user queries)

Cons:

  1. Loss of data locality
  2. Slightly more complicated clustering

Currently, this can be achieved by adding a set of nodes with a bogus partition (99999 or something), but that means that peer queries will still be issued to them resulting in a bit of noise.

Given that there is an effective workaround for this, I am entering this issue mostly for discussion.

@Dieterbe
Copy link
Contributor

FWIW, i've been having some talks with the folks behind https://eng.uber.com/m3/ to see how we can collaborate and save effort, and the main thing i've noticed so far is the m3 coordinator. a few interesting points about it:

  1. they're designing it to do jit/on-demand chunk-decoding (not predecoding into huge point slices in advance), to reduce memory footprint significantly
  2. comes with a bunch of processing built-in, including some graphite functionality. but it'll require more work to get it to the same level as MT.
  3. supports M3DB (obviously), but it seems not too hard to make it able to query MT clusters as well (basically it would participate in the gossip clustering layer as a client / non-data node)

they're heavily working on refactoring code and cleaning stuff up, but they told me they would keep me in the loop and let me know when it's ready for us to dig into it and attempt a proof of concept.

I think 1 is the most interesting for us. we would need to confirm that it indeed leads us to significantly reduced memory footprint. if it does, that could be the "query engine 2.0" so to speak for MT.

so it seems that there will be a need for client participants (non-data hosting) anyway, regardless of which query engine we put in front of MT

@woodsaj
Copy link
Member

woodsaj commented Aug 10, 2018

I really like the idea of having dedicated query nodes.

I dont think this would be hard to achieve. I think all we need to do is

  1. add another cluster mode, so we have "Single", "multi" and a new "distributed" mode.
  2. extend the Node interface to have a "HasData()" method and use that to filter nodes in MembersForSpeculativeQuery() and MembersForQuery() when mode==distributed
  3. build a new binary, metrictank-query that just loads the client facing API routes and doesnt initialize the idx, input or store plugins.

@Dieterbe
Copy link
Contributor

Dieterbe commented Oct 5, 2018

I agree we can move forward with a minimal change like woodsaj describes above, though:

  1. i think "distributed" doesn't reflect well what this mode is about. I like "frontend" or "query".
  2. i'm not convinced we need to build a new binary to support this mode. we can simply bypass idx, input and store if mode == frontend

this will be beneficial to our deployments as well.

@Dieterbe Dieterbe added this to the 1.0 milestone Oct 29, 2018
@Dieterbe
Copy link
Contributor

This might even lead to significant resource usage optimisations. If nodes consuming data are less likely to crash , we may be able to transition from 2x readers and 1x writer to 1x reader and 1x writer (and both would be queried by query nodes)

@shanson7
Copy link
Collaborator Author

Status update:

We deployed the query later using the 9999 partition hack described above.

We did not see reduced avg memory/allocation on the read nodes, but we saw a full reduction on heap spikes and OOMkills. At ~50 render reqs/s we used to see 30-50 OOMkills / day, now we get zero for the read nodes and just 1 or 2 for the query nodes (which are almost instantly back in action).

@deniszh
Copy link

deniszh commented Jan 21, 2019

Tried to do same trick with client nodes, i.e. adding node serving partition 99999, but got error instead

kafka-cluster: configured partitions not in list of available partitions. missing 99999

@shanson7 : did you patch your MT for avoiding this?

@shanson7
Copy link
Collaborator Author

@deniszh - You need to use the carbon input plugin, not kafka-mdm.

@Dieterbe
Copy link
Contributor

Dieterbe commented Feb 8, 2019

To accommodate this, and also fix #1013 in the process, I see 2 options:

A come up with 3 names for different cluster modes

  1. when a node doesn't need to gossip because it doesn't need to fan out when data is not sharded
  2. when data is sharded so we need fanout, and we're a node that has some of the data
  3. when data is sharded, but node is a query/frontend node

my suggestions for each. each is in order of my preference.

  1. full quiet independent
  2. shard data
  3. query frontend

in this case, we should validate that input is enabled/disabled if mode is 1/2 or 3 respectively.

B split it up

  • have a setting like gossip on/off or sharding on/off in the cluster section
  • have the difference between the above cases 2 and 3 implicitly be known based on whether an input is enabled, but in that case we should still add validation that if input is disabled gossip should be enabled

I think I have a small preference for A because it is more explicit and has room to add more modes later if needed.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 19, 2019

extend the Node interface to have a "HasData()" method and use that to filter nodes in MembersForSpeculativeQuery() and MembersForQuery() when mode==distributed

I think this can be inferred from members.GetPartitions() which we already use.
I now have a working implementation for this feature in #1243

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants