Add guided walkthrough on configuring fault-tolerant execution#14861
Add guided walkthrough on configuring fault-tolerant execution#14861arhimondr merged 1 commit intotrinodb:masterfrom
Conversation
mosabua
left a comment
There was a problem hiding this comment.
Looks good so far. I think we need to emphasize a bit more that this allows processing queries that otherwise would fail with out of memory issue, and it also allows using spot instances since a worker dying is no longer an issues and a query can still finish..
Both of these are crucial for batch workloads .. which are often long running queries that take up lots of memory
20ef4d5 to
0c75911
Compare
There was a problem hiding this comment.
should you merge 3. with 2. to have single section about k8s deployemnt and single section about non-k8s deployment?
There was a problem hiding this comment.
I think it makes sense to keep in separate steps, because they're broken up by which config file needs to be edited in the non-k8s deployment - exchange-manager.properties and config.properties, respectively. That may be confusing if condensed into a single step.
There was a problem hiding this comment.
It is also strongly encouraged to change the default task concurrency, e.g.: "task.concurrency=8".
Ideally it would be great if the default recommended settings were being applied automatically when retry-policy=TASK is enabled. Let me think about it how it can be done (#14955). For now let's keep the settings here
There was a problem hiding this comment.
Sounds good to me. We'll also need to modify the FTE reference doc (https://trino.io/docs/current/admin/fault-tolerant-execution.html) to describe what settings change when enabling task retries, either in your PR or I can update it afterwards once yours is merged and we know exactly what the setting changes will be.
There was a problem hiding this comment.
Added task.concurrency=8 to the YAML anyway, even if it's set as the default in the future it's helpful to show what properties are relevant.
There was a problem hiding this comment.
Added task.concurrency=8 to the YAML anyway, even if it's set as the default in the future it's helpful to show what properties are relevant.
I would prefer to avoid it. It may tend to evolve over time and ideally we would like to avoid users setting anything explicitly if not necessary, as then it might not be possible to easily change the recommended settings.
There was a problem hiding this comment.
Got it, I'll remove it then. This page is intended as a "quick setup" guide anyway, so if we want to discuss more nuances for this property then the reference docs are more appropriate for that.
There was a problem hiding this comment.
Let's remove exchange.compression-enabled=true and query.low-memory-killer.delay=0s from here as well, as now those are set automatically
There was a problem hiding this comment.
@arhimondr I thought they aren't set automatically if the user has them explicitly configured prior to setting retry-policy=TASK? My thought is that it can't hurt to leave them in, in case the reader has the property configured and would then need to change its value in order to meet best practices.
There was a problem hiding this comment.
Hmm, maybe it's worth mentioning it explicitly, something like
"If you have exchange.compression-enabled, query.low-memory-killer.delay configured explicitly - remove them", or something like that?
Basically what I'm afraid of is that somebody may configure query.low-memory-killer.delay=0s explicitly following our recommendation, but the recommended value might need to be changed in the future.
There was a problem hiding this comment.
That makes sense to me. I removed those properties from this section, then to address the new auto-default change behavior I rewrote the paragraph on line 125 of the /admin/ guide in this PR.
1181030 to
7ff63ad
Compare
|
@arhimondr or @losipiuk .. can we get this merged? |
There was a problem hiding this comment.
Let's remove exchange.compression-enabled=true and query.low-memory-killer.delay=0s from here as well, as now those are set automatically
7ff63ad to
cbbb8fc
Compare
cbbb8fc to
d33c9c5
Compare
|
Everything is good to go. Can we please just get this merged @arhimondr @losipiuk ? |
Description
Create a simplified guide on how to configure Trino with fault-tolerant execution, emphasizing the most common use case with an AWS S3 bucket for exchange storage. This is a companion to the existing
/admin/fault-tolerant-executionreference material.Non-technical explanation
Add a guide explaining how to configure fault-tolerant execution on a Trino cluster, simplifying the feature for the average reader.
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: