Presto: More robust handling of existing config#109
Conversation
|
An alternative behavior (suggested by @patdevinwilson unless I'm misunderstanding him!) may be to allow the presence of independent generated (and editable) configs for all three variants, so that local edited dev versions for all three can exist in parallel, perhaps for comparison purposes. This may be overkill. Discuss. |
There was a problem hiding this comment.
lsmem may not be available on all environments. Consider adding a fallback (e.g., /proc/meminfo or free -g) so the script works across more hosts.
Example fallback: if command -v lsmem is missing, parse /proc/meminfo or use free -g
There was a problem hiding this comment.
@patdevinwilson do you have an example of a system where lsmem -b does NOT work?
patdevinwilson
left a comment
There was a problem hiding this comment.
LGTM with non-blocking suggestions. The change improves safety and UX. Addressing the fallback for memory detection and adding a simple config version would make this more robust, but they are not blockers for merging
Per-variant directories (recommended medium complexity) |
| rm -rf generated | ||
| mkdir -p generated | ||
| cat > generated/config.json << EOF | ||
| # generate only if no existing config or overwrite flag is set |
There was a problem hiding this comment.
What configurations are being modified locally? Can you please expand on where the request for the updates in the PR is coming from?
There was a problem hiding this comment.
@paul-aiyedun Todd and Patrick were both iterating configs by editing their local files. Once the original auto-config PR (#48) landed, this was dangerous, because the config would be automatically re-generated every time you ran the start script, so local speculative edits would be overwritten if you hadn't copied them aside. Hence, they asked for this change.
|
Apologies I'm chiming in late, but whether in this PR or a follow-up, I'd strongly advocate for separate configs for GPU, CPU Native, and Java, as I've hit some nasty surprised switching between them (after I removed the generate command from the start presto scripts, which will become the default with this PR), particularly as the number of drivers doesn't change when you move from GPU to CPU and vice-versa, with all the ensuing impacts to runtime and memory usage. |
@tmostak The |
|
@tmostak et al, per our discussion, I have now made it generate and maintain separate configs per variant, so you are free to edit one variant's generated config without risk of losing that work if you have to switch to another variant. |
| services: | ||
| presto-base-volumes: | ||
| volumes: | ||
| - ./config/generated/etc_common:/opt/presto-server/etc |
There was a problem hiding this comment.
Each of the three variants now has a separate generated config tree, so ALL of the mappings have to move to the leaf container recipes.
| service: presto-base-coordinator | ||
| volumes: | ||
| - ./config/generated/etc_coordinator/config_java.properties:/opt/presto-server/etc/config.properties | ||
| - ./config/generated/java/etc_common:/opt/presto-server/etc |
There was a problem hiding this comment.
I am assuming that order is important here, to ensure that the behavior is still to map the directory and THEN map some of the individual files (to somewhere else), same as when the directory mapping was in the parent recipe.
|
Something's still not right with Java mode memory config. Test job with this branch failed, although last night's CI was fine. I don't think this branch could have broken it, but I'll check it again in the morning... |
mattgara
left a comment
There was a problem hiding this comment.
LGTM, just had some questions.
| echo -e "${GREEN}$1${NC}" | ||
| } | ||
|
|
||
| if [ ! -x ../pbench/pbench ]; then |
There was a problem hiding this comment.
Great little check, I'm wondering if we need to broadly apply such checks to all our scripts! (I realize this was not added in this PR but it just gets highlighted, and I think I've previously missed this.)
There was a problem hiding this comment.
@misiugodfrey perhaps you can implement something equivalent in your script utilities stuff, that can be called from any script to verify that it's being run from the right location.
| fi | ||
| # now perform other variant-specific modifications to the generated configs | ||
| if [[ "${VARIANT_TYPE}" == "gpu" ]]; then | ||
| # for GPU variant, uncomment these optimizer settings |
There was a problem hiding this comment.
(Also not added in this PR but just being brought to my attention in this one)
"uncomment these ... when" seems like a perfect fit for an CLI option that enables this (optionally of course!)
There was a problem hiding this comment.
Yeah, this was #111. I don't think this should be a command-line option, as it MUST be enabled for GPU, and MUST NOT be enabled for CPU. This was my latest way of avoiding forking the template files, which @paul-aiyedun and others are opposed to.
| echo "Generating Presto Config files for '${VARIANT_TYPE}' for host with ${NPROC} CPU cores and ${RAM_GB}GB RAM" | ||
|
|
||
| # (re-)generate the config.json file | ||
| rm -rf ${CONFIG_DIR} |
There was a problem hiding this comment.
Does this need special handling in case the script fails to complete (i.e. do we need a trap to clean this up?)
There was a problem hiding this comment.
Do you mean it should remove any config files it manages to generate if it doesn't generate it PROPERLY? At this point, the only thing that can fail is pbench itself, and if that fails, you get nothing anyway. The server won't start in that situation, so even if something WAS generated, the server won't try to use it, and it'll get deleted again when it next succeeds.
Previous attempts at allowing automatic memory config for Java mode failed, with the Coordinator failing to start, with errors of the form: Invalid memory configuration. The sum of max total query memory per node (96636764160) and heap headroom (36399847833) cannot be larger than the available heap memory (121332826112) This turns out to be because there are two additional memory-related configs that were in the Native Coordinator config but not in the Java one which turn out to be essential in order for the memory assignment math to be correct. Thanks to @paul-aiyedun for finding the fix. Also fixes the Java config.properties file container mapping broken accidentally in #109.
This PR adds logic to the Presto start scripts to re-use any existing config files for the given variant in case they have been modified by the user.
Config files for the given variant will be automatically generated if absent.
A config tree is generated for each variant independently, so that the user is free to make manual edits in the generated version for one variant, but switch to another variant temporarily (e.g. to do an
ANALYZEin CPU mode) without losing the edits.Also added an
--overwrite-configoption to the start scripts to force regeneration of the config when deliberately switching variants or to revert an edited config back to default.Configs generated or edited with previous versions of
velox-testingwill be ignored, and should ideally be removed.