Support external query layer #989

shanson7 · 2018-08-10T15:52:59Z

With Metrictanks distributed design, the benefit from handling requests locally diminishes as the number of shard groups grows. With 100 shard groups, a single instance only handles ~1% of the data.

However, it has the overhead of needing to accumulate all of the data into a single node and execute the functions. Stack this onto the load of ingesting data and bookkeeping and a bad query can contribute to issues with ingestion.

Pros:

Reduced load on ingest nodes
Updates can be segregated (a new native function means only the query nodes need a restart)
Query nodes can be restarted very quickly (in the case of egregious user queries)

Cons:

Loss of data locality
Slightly more complicated clustering

Currently, this can be achieved by adding a set of nodes with a bogus partition (99999 or something), but that means that peer queries will still be issued to them resulting in a bit of noise.

Given that there is an effective workaround for this, I am entering this issue mostly for discussion.

Dieterbe · 2018-08-10T16:10:13Z

FWIW, i've been having some talks with the folks behind https://eng.uber.com/m3/ to see how we can collaborate and save effort, and the main thing i've noticed so far is the m3 coordinator. a few interesting points about it:

they're designing it to do jit/on-demand chunk-decoding (not predecoding into huge point slices in advance), to reduce memory footprint significantly
comes with a bunch of processing built-in, including some graphite functionality. but it'll require more work to get it to the same level as MT.
supports M3DB (obviously), but it seems not too hard to make it able to query MT clusters as well (basically it would participate in the gossip clustering layer as a client / non-data node)

they're heavily working on refactoring code and cleaning stuff up, but they told me they would keep me in the loop and let me know when it's ready for us to dig into it and attempt a proof of concept.

I think 1 is the most interesting for us. we would need to confirm that it indeed leads us to significantly reduced memory footprint. if it does, that could be the "query engine 2.0" so to speak for MT.

so it seems that there will be a need for client participants (non-data hosting) anyway, regardless of which query engine we put in front of MT

woodsaj · 2018-08-10T19:07:36Z

I really like the idea of having dedicated query nodes.

I dont think this would be hard to achieve. I think all we need to do is

add another cluster mode, so we have "Single", "multi" and a new "distributed" mode.
extend the Node interface to have a "HasData()" method and use that to filter nodes in MembersForSpeculativeQuery() and MembersForQuery() when mode==distributed
build a new binary, metrictank-query that just loads the client facing API routes and doesnt initialize the idx, input or store plugins.

Dieterbe · 2018-10-05T21:12:29Z

I agree we can move forward with a minimal change like woodsaj describes above, though:

i think "distributed" doesn't reflect well what this mode is about. I like "frontend" or "query".
i'm not convinced we need to build a new binary to support this mode. we can simply bypass idx, input and store if mode == frontend

this will be beneficial to our deployments as well.

Dieterbe · 2018-11-15T09:21:03Z

This might even lead to significant resource usage optimisations. If nodes consuming data are less likely to crash , we may be able to transition from 2x readers and 1x writer to 1x reader and 1x writer (and both would be queried by query nodes)

shanson7 · 2018-11-15T11:58:40Z

Status update:

We deployed the query later using the 9999 partition hack described above.

We did not see reduced avg memory/allocation on the read nodes, but we saw a full reduction on heap spikes and OOMkills. At ~50 render reqs/s we used to see 30-50 OOMkills / day, now we get zero for the read nodes and just 1 or 2 for the query nodes (which are almost instantly back in action).

deniszh · 2019-01-21T15:28:40Z

Tried to do same trick with client nodes, i.e. adding node serving partition 99999, but got error instead

kafka-cluster: configured partitions not in list of available partitions. missing 99999

@shanson7 : did you patch your MT for avoiding this?

shanson7 · 2019-01-24T16:20:03Z

@deniszh - You need to use the carbon input plugin, not kafka-mdm.

Dieterbe · 2019-02-08T09:35:53Z

To accommodate this, and also fix #1013 in the process, I see 2 options:

A come up with 3 names for different cluster modes

when a node doesn't need to gossip because it doesn't need to fan out when data is not sharded
when data is sharded so we need fanout, and we're a node that has some of the data
when data is sharded, but node is a query/frontend node

my suggestions for each. each is in order of my preference.

full quiet independent
shard data
query frontend

in this case, we should validate that input is enabled/disabled if mode is 1/2 or 3 respectively.

B split it up

have a setting like gossip on/off or sharding on/off in the cluster section
have the difference between the above cases 2 and 3 implicitly be known based on whether an input is enabled, but in that case we should still add validation that if input is disabled gossip should be enabled

I think I have a small preference for A because it is more explicit and has room to add more modes later if needed.

fix #989 fix 1013

Dieterbe · 2019-03-19T06:02:07Z

extend the Node interface to have a "HasData()" method and use that to filter nodes in MembersForSpeculativeQuery() and MembersForQuery() when mode==distributed

I think this can be inferred from members.GetPartitions() which we already use.
I now have a working implementation for this feature in #1243

fix #989 fix #1013

Dieterbe added this to the 1.0 milestone Oct 29, 2018

Dieterbe self-assigned this Mar 19, 2019

Dieterbe added a commit that referenced this issue Mar 19, 2019

add new 'query' cluster mode and better name for modes

3067a21

fix #989 fix 1013

Dieterbe mentioned this issue Mar 19, 2019

add new 'query' cluster mode and better name for modes #1243

Merged

Dieterbe added a commit that referenced this issue Mar 19, 2019

add new 'query' cluster mode and better name for modes

2791bdb

fix #989 fix #1013

Dieterbe added a commit that referenced this issue Mar 19, 2019

add new 'query' cluster mode and better name for modes

bf3e385

fix #989 fix #1013

Dieterbe added a commit that referenced this issue Mar 19, 2019

add new 'query' cluster mode and better name for modes

92f6766

fix #989 fix #1013

Dieterbe added a commit that referenced this issue Mar 20, 2019

add new 'query' cluster mode and better name for modes

a4ad73d

fix #989 fix #1013

Dieterbe added a commit that referenced this issue Mar 27, 2019

add new 'query' cluster mode and better name for modes

2976713

fix #989 fix #1013

Dieterbe added a commit that referenced this issue Mar 27, 2019

add new 'query' cluster mode and better name for modes

3038909

fix #989 fix #1013

Dieterbe closed this as completed in #1243 Mar 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support external query layer #989

Support external query layer #989

shanson7 commented Aug 10, 2018

Dieterbe commented Aug 10, 2018

woodsaj commented Aug 10, 2018

Dieterbe commented Oct 5, 2018 •

edited

Loading

Dieterbe commented Nov 15, 2018

shanson7 commented Nov 15, 2018

deniszh commented Jan 21, 2019

shanson7 commented Jan 24, 2019

Dieterbe commented Feb 8, 2019 •

edited

Loading

Dieterbe commented Mar 19, 2019 •

edited

Loading

Support external query layer #989

Support external query layer #989

Comments

shanson7 commented Aug 10, 2018

Dieterbe commented Aug 10, 2018

woodsaj commented Aug 10, 2018

Dieterbe commented Oct 5, 2018 • edited Loading

Dieterbe commented Nov 15, 2018

shanson7 commented Nov 15, 2018

deniszh commented Jan 21, 2019

shanson7 commented Jan 24, 2019

Dieterbe commented Feb 8, 2019 • edited Loading

A come up with 3 names for different cluster modes

B split it up

Dieterbe commented Mar 19, 2019 • edited Loading

Dieterbe commented Oct 5, 2018 •

edited

Loading

Dieterbe commented Feb 8, 2019 •

edited

Loading

Dieterbe commented Mar 19, 2019 •

edited

Loading