ACTUALLY corrects the behavior in #297 #303

ferricoxide · 2021-03-04T15:09:53Z

Previous fix for #297 turned out to not actually fix the "restarted too many times too soon" behavior. This fix appears to better do so.

ash-linux/el7/STIGbyID/cat2/restart_sshd.sls

lorengordon

I'd like to first just try changing:

- file: file_{{ stig_id }}-{{ cfgFile }}

to:

- file: {{ cfgFile }}

in each file, to see if that gives salt enough of a clue to de-dupe the service.running states.

If that doesn't work, then we can proceed with moving the service.running to a separate sls, but we'll still need to modify the stig sls's to include the restart_sshd sls, remove the onchanges from the service.running state, and use onchanges_in the file.replace states to point at the state service: service_sshd_restart.

The include directive should be within the jinja else block, so the service state does not run if the stig_id is skipped.

ferricoxide · 2021-03-04T17:00:21Z

I'd like to first just try changing:
- file: file_{{ stig_id }}-{{ cfgFile }}
to:
- file: {{ cfgFile }}
in each file, to see if that gives salt enough of a clue to de-dupe the service.running states.

If that doesn't work, then we can proceed with moving the service.running to a separate sls, but we'll still need to modify the stig sls's to include the restart_sshd sls, remove the onchanges from the service.running state, and use onchanges_in the file.replace states to point at the state service: service_sshd_restart.

The include directive should be within the jinja else block, so the service state does not run if the stig_id is skipped.

Lemme make a quick branch to try it out.

ferricoxide · 2021-03-09T16:49:12Z

I'd like to first just try changing:
- file: file_{{ stig_id }}-{{ cfgFile }}
to:
- file: {{ cfgFile }}
in each file, to see if that gives salt enough of a clue to de-dupe the service.running states.

If that doesn't work, then we can proceed with moving the service.running to a separate sls, but we'll still need to modify the stig sls's to include the restart_sshd sls, remove the onchanges from the service.running state, and use onchanges_in the file.replace states to point at the state service: service_sshd_restart.

The include directive should be within the jinja else block, so the service state does not run if the stig_id is skipped.

Ok, so, they've updated saltstack so that using file: {{ cfgFile }} no longer produces an error. However, changing the states (back) to referencing the config-file – directly or via the previous state-ID method – still results in errors:

2021-03-09 16:42:50,815 P2466 [INFO]    2021-03-09 16:42:50,318 [watchmaker.workers.base.SaltLinux][ERROR][3082]: Command stderr: b'To force a start use "systemctl reset-failed sshd.service" followed by "systemctl start sshd.service" again.'
2021-03-09 16:42:50,815 P2466 [INFO]    2021-03-09 16:42:50,562 [watchmaker.workers.base.SaltLinux][DEBUG][3082]: Command retcode: 2
2021-03-09 16:42:50,815 P2466 [INFO]    2021-03-09 16:42:50,622 [watchmaker.workers.base.SaltLinux][INFO ][3082]: Setting selinux back to enforcing mode
2021-03-09 16:42:50,815 P2466 [INFO]    2021-03-09 16:42:50,623 [watchmaker.workers.base.SaltLinux][DEBUG][3082]: Command: setenforce enforcing
2021-03-09 16:42:50,815 P2466 [INFO]    2021-03-09 16:42:50,634 [watchmaker.workers.base.SaltLinux][DEBUG][3082]: Command retcode: 0
2021-03-09 16:42:50,816 P2466 [INFO]    2021-03-09 16:42:50,635 [watchmaker.Client][CRITICAL][3082]: Execution of the workers cadence has failed.
2021-03-09 16:42:50,816 P2466 [INFO]    2021-03-09 16:42:50,635 [watchmaker][CRITICAL][3082]:
2021-03-09 16:42:50,816 P2466 [INFO]    Traceback (most recent call last):
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/bin/watchmaker", line 8, in <module>
2021-03-09 16:42:50,816 P2466 [INFO]        sys.exit(main())
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
2021-03-09 16:42:50,816 P2466 [INFO]        return self.main(*args, **kwargs)
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 782, in main
2021-03-09 16:42:50,816 P2466 [INFO]        rv = self.invoke(ctx)
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
2021-03-09 16:42:50,816 P2466 [INFO]        return ctx.invoke(self.callback, **ctx.params)
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
2021-03-09 16:42:50,816 P2466 [INFO]        return callback(*args, **kwargs)
2021-03-09 16:42:50,816 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/watchmaker/cli.py", line 110, in main
2021-03-09 16:42:50,817 P2466 [INFO]        sys.exit(watchmaker_client.install())
2021-03-09 16:42:50,817 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/watchmaker/__init__.py", line 452, in install
2021-03-09 16:42:50,817 P2466 [INFO]        workers_manager.worker_cadence()
2021-03-09 16:42:50,817 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/watchmaker/managers/worker_manager.py", line 59, in worker_cadence
2021-03-09 16:42:50,817 P2466 [INFO]        worker.install()
2021-03-09 16:42:50,817 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/watchmaker/workers/salt.py", line 828, in install
2021-03-09 16:42:50,817 P2466 [INFO]        self.process_states(self.salt_states, self.exclude_states)
2021-03-09 16:42:50,817 P2466 [INFO]      File "/usr/local/lib/python3.6/site-packages/watchmaker/workers/salt.py", line 659, in process_states
2021-03-09 16:42:50,817 P2466 [INFO]        indent=4
2021-03-09 16:42:50,817 P2466 [INFO]    watchmaker.exceptions.WatchmakerException: Salt state execution failed:
2021-03-09 16:42:50,817 P2466 [INFO]        listener_service_RHEL-07-040670-/etc/ssh/sshd_config:
2021-03-09 16:42:50,817 P2466 [INFO]            __id__: listener_service_RHEL-07-040670-/etc/ssh/sshd_config
2021-03-09 16:42:50,817 P2466 [INFO]            __run_num__: 653
2021-03-09 16:42:50,817 P2466 [INFO]            __sls__: ash-linux.el7.STIGbyID.cat2.RHEL-07-040670
2021-03-09 16:42:50,818 P2466 [INFO]            changes: {}
2021-03-09 16:42:50,818 P2466 [INFO]            comment: 'Running scope as unit run-29592.scope.
2021-03-09 16:42:50,818 P2466 [INFO]
2021-03-09 16:42:50,818 P2466 [INFO]                Job for sshd.service failed because start of the service was attempted
2021-03-09 16:42:50,818 P2466 [INFO]                too often. See "systemctl status sshd.service" and "journalctl -xe" for
2021-03-09 16:42:50,818 P2466 [INFO]                details.
2021-03-09 16:42:50,818 P2466 [INFO]
2021-03-09 16:42:50,818 P2466 [INFO]                To force a start use "systemctl reset-failed sshd.service" followed by
2021-03-09 16:42:50,818 P2466 [INFO]                "systemctl start sshd.service" again.'
2021-03-09 16:42:50,818 P2466 [INFO]            duration: 93.73
2021-03-09 16:42:50,818 P2466 [INFO]            name: sshd
2021-03-09 16:42:50,818 P2466 [INFO]            result: false
2021-03-09 16:42:50,818 P2466 [INFO]            start_time: '16:42:49.822611'
2021-03-09 16:42:50,819 P2466 [INFO]        listener_service_RHEL-07-040680-/etc/ssh/sshd_config:
2021-03-09 16:42:50,819 P2466 [INFO]            __id__: listener_service_RHEL-07-040680-/etc/ssh/sshd_config
2021-03-09 16:42:50,819 P2466 [INFO]            __run_num__: 654
2021-03-09 16:42:50,819 P2466 [INFO]            __sls__: ash-linux.el7.STIGbyID.cat2.RHEL-07-040680
2021-03-09 16:42:50,819 P2466 [INFO]            changes: {}
2021-03-09 16:42:50,819 P2466 [INFO]            comment: 'Running scope as unit run-29597.scope.
2021-03-09 16:42:50,819 P2466 [INFO]
2021-03-09 16:42:50,819 P2466 [INFO]                Job for sshd.service failed because start of the service was attempted
2021-03-09 16:42:50,819 P2466 [INFO]                too often. See "systemctl status sshd.service" and "journalctl -xe" for
2021-03-09 16:42:50,819 P2466 [INFO]                details.
2021-03-09 16:42:50,819 P2466 [INFO]
2021-03-09 16:42:50,819 P2466 [INFO]                To force a start use "systemctl reset-failed sshd.service" followed by
2021-03-09 16:42:50,819 P2466 [INFO]                "systemctl start sshd.service" again.'
2021-03-09 16:42:50,819 P2466 [INFO]            duration: 136.291
2021-03-09 16:42:50,820 P2466 [INFO]            name: sshd
2021-03-09 16:42:50,820 P2466 [INFO]            result: false
2021-03-09 16:42:50,820 P2466 [INFO]            start_time: '16:42:49.916974'
2021-03-09 16:42:50,820 P2466 [INFO]        listener_service_RHEL-07-040690-/etc/ssh/sshd_config:
2021-03-09 16:42:50,820 P2466 [INFO]            __id__: listener_service_RHEL-07-040690-/etc/ssh/sshd_config
2021-03-09 16:42:50,820 P2466 [INFO]            __run_num__: 655
2021-03-09 16:42:50,820 P2466 [INFO]            __sls__: ash-linux.el7.STIGbyID.cat2.RHEL-07-040690
2021-03-09 16:42:50,820 P2466 [INFO]            changes: {}
2021-03-09 16:42:50,820 P2466 [INFO]            comment: 'Running scope as unit run-29603.scope.
2021-03-09 16:42:50,820 P2466 [INFO]
2021-03-09 16:42:50,820 P2466 [INFO]                Job for sshd.service failed because start of the service was attempted
2021-03-09 16:42:50,820 P2466 [INFO]                too often. See "systemctl status sshd.service" and "journalctl -xe" for
2021-03-09 16:42:50,820 P2466 [INFO]                details.
2021-03-09 16:42:50,821 P2466 [INFO]
2021-03-09 16:42:50,821 P2466 [INFO]                To force a start use "systemctl reset-failed sshd.service" followed by
2021-03-09 16:42:50,821 P2466 [INFO]                "systemctl start sshd.service" again.'
2021-03-09 16:42:50,821 P2466 [INFO]            duration: 131.175
2021-03-09 16:42:50,821 P2466 [INFO]            name: sshd
2021-03-09 16:42:50,821 P2466 [INFO]            result: false
2021-03-09 16:42:50,821 P2466 [INFO]            start_time: '16:42:50.053755'
2021-03-09 16:42:50,821 P2466 [INFO]        listener_service_RHEL-07-040700-/etc/ssh/sshd_config:
2021-03-09 16:42:50,821 P2466 [INFO]            __id__: listener_service_RHEL-07-040700-/etc/ssh/sshd_config
2021-03-09 16:42:50,821 P2466 [INFO]            __run_num__: 656
2021-03-09 16:42:50,821 P2466 [INFO]            __sls__: ash-linux.el7.STIGbyID.cat2.RHEL-07-040700
2021-03-09 16:42:50,821 P2466 [INFO]            changes: {}
2021-03-09 16:42:50,821 P2466 [INFO]            comment: 'Running scope as unit run-29608.scope.
2021-03-09 16:42:50,821 P2466 [INFO]
2021-03-09 16:42:50,822 P2466 [INFO]                Job for sshd.service failed because start of the service was attempted
2021-03-09 16:42:50,822 P2466 [INFO]                too often. See "systemctl status sshd.service" and "journalctl -xe" for
2021-03-09 16:42:50,822 P2466 [INFO]                details.
2021-03-09 16:42:50,822 P2466 [INFO]
2021-03-09 16:42:50,822 P2466 [INFO]                To force a start use "systemctl reset-failed sshd.service" followed by
2021-03-09 16:42:50,822 P2466 [INFO]                "systemctl start sshd.service" again.'
2021-03-09 16:42:50,822 P2466 [INFO]            duration: 132.083
2021-03-09 16:42:50,822 P2466 [INFO]            name: sshd
2021-03-09 16:42:50,822 P2466 [INFO]            result: false
2021-03-09 16:42:50,822 P2466 [INFO]            start_time: '16:42:50.185436'
2021-03-09 16:42:50,822 P2466 [INFO]
2021-03-09 16:42:50,822 P2466 [INFO] ------------------------------------------------------------
2021-03-09 16:42:50,823 P2466 [ERROR] Exited with error code 1

ferricoxide · 2021-03-09T17:01:58Z

@lorengordon @eemperor

Any other methods want to test before accepting the PR?

lorengordon

The current implementation is still incomplete. It relies on someone invoking the .cat2 sls to include the service restart. If someone calls RHEL-07-040660 directly, no restart.

To address that, each of the RHEL-07-xxxxxx.sls files changed in this PR can implement the following pattern with include and onchanges_in:

{%- else %}
include:
  - ash-linux.el7.STIGbyID.cat2.restart_sshd

file_{{ stig_id }}-{{ cfgFile }}:
  file.replace:
    - name: '{{ cfgFile }}'
    - pattern: '^\s*{{ parmName }} .*$'
    - repl: '{{ parmName }} {{ parmValu }}'
    - append_if_not_found: True
    - not_found_content: |-
        # Inserted per STIG {{ stig_id }}
        {{ parmName }} {{ parmValu }}
    - onchanges_in:
      - service: service_sshd_restart
{%- endif %}

Then:

Remove ash-linux.el7.STIGbyID.cat2.restart_sshd from cat2/init.sls as it is no longer necessary
Delete the onchanges directive from cat2/restart_sshd.sls, since each state sets the requisite with onchanges_in

ferricoxide · 2021-03-09T17:56:26Z

Seems like that would still cause each state that has the include/onchanges_in content to attempt to individually-restart the sshd service (resulting in the "too many times" error)?

lorengordon · 2021-03-09T18:40:17Z

I don't believe so, but it's a bit difficult to say for sure without studying the rendered highstate data structure.

ferricoxide · 2021-03-09T18:42:06Z

Running a test now to validate: just because things feel like you're circling over the same terrain doesn't actually mean you are (and, even if you are, that doing it on a ATV rather than a dirtbike won't produce different results).

ferricoxide · 2021-03-09T19:18:58Z

Ok. First run through seems to have not triggered the "too many restarts" error ...but then, because I'd branched off of master, meant that my skip-logic for the readonly TMOUT= login-profile state was no longer present in the testing-branch's code. Going to re-run with skipping actually activated for that one. If it runs a second time through the sshd stuff without incident, I'll merge the code-changes back into this PR's contents.

(git cherry-pick will be useful, here 😛)

ferricoxide · 2021-03-09T19:33:20Z

Looks good:

[…elided…]
2021-03-09 19:26:18,432 P2405 [INFO] ------------------------------------------------------------
2021-03-09 19:26:18,432 P2405 [INFO] Completed successfully.
2021-03-09 19:26:18,436 P2405 [INFO] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2021-03-09 19:26:18,436 P2405 [INFO] Config finalize
2021-03-09 19:26:18,437 P2405 [INFO] ============================================================
2021-03-09 19:26:18,437 P2405 [INFO] Command 10-signal-success
2021-03-09 19:26:19,329 P2405 [INFO] Completed successfully.
2021-03-09 19:26:19,333 P2405 [INFO] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2021-03-09 19:26:19,333 P2405 [INFO] Config reboot
2021-03-09 19:26:19,334 P2405 [INFO] ============================================================
2021-03-09 19:26:19,334 P2405 [INFO] Command 10-reboot
2021-03-09 19:26:19,405 P2405 [INFO] -----------------------Command Output-----------------------
2021-03-09 19:26:19,406 P2405 [INFO]    Shutdown scheduled for Tue 2021-03-09 19:27:19 UTC, use 'shutdown -c' to cancel.
2021-03-09 19:26:19,406 P2405 [INFO] ------------------------------------------------------------
2021-03-09 19:26:19,406 P2405 [INFO] Completed successfully.

Will push the mods, shortly.

…errors

ash-linux/el7/STIGbyID/cat2/restart_sshd.sls

ash-linux/el7/STIGbyID/cat2/init.sls

changes addressed

Try to collect restarts into single handler

c8b8511

ferricoxide requested a review from a team March 4, 2021 15:09

eemperor previously requested changes Mar 4, 2021

View reviewed changes

ash-linux/el7/STIGbyID/cat2/restart_sshd.sls Outdated Show resolved Hide resolved

ash-linux/el7/STIGbyID/cat2/restart_sshd.sls Outdated Show resolved Hide resolved

lorengordon requested changes Mar 4, 2021

View reviewed changes

ferricoxide added 3 commits March 4, 2021 11:56

Need Matchy-matchy so state can call notification-helper

7ae5ac2

Need to ACTUALLY tell salt to invoke the restarter

e086569

Remove unused var-set

bb38553

lorengordon requested changes Mar 9, 2021

View reviewed changes

Fix the rerunning of sshd_config states in a way that avoids systemd …

062a494

…errors

lorengordon reviewed Mar 9, 2021

View reviewed changes

ash-linux/el7/STIGbyID/cat2/restart_sshd.sls Outdated Show resolved Hide resolved

ash-linux/el7/STIGbyID/cat2/init.sls Outdated Show resolved Hide resolved

ferricoxide added 2 commits March 9, 2021 16:04

Updated per /plus3it/pull/303#discussion_r590704013

657378b

Updated per /plus3it/pull/303#discussion_r590703452

becebd5

ferricoxide requested review from eemperor and lorengordon March 9, 2021 21:24

lorengordon approved these changes Mar 9, 2021

View reviewed changes

ferricoxide merged commit 1f84c78 into plus3it:master Mar 9, 2021

ferricoxide deleted the Issue_297-II branch March 9, 2021 21:38

dependabot bot mentioned this pull request Mar 18, 2021

Bump src/watchmaker/static/salt/formulas/ash-linux-formula from 5de3082 to 0ad16ee eemperor/watchmaker#168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACTUALLY corrects the behavior in #297 #303

ACTUALLY corrects the behavior in #297 #303

ferricoxide commented Mar 4, 2021

lorengordon left a comment •

edited

Loading

ferricoxide commented Mar 4, 2021

ferricoxide commented Mar 9, 2021

ferricoxide commented Mar 9, 2021

lorengordon left a comment •

edited

Loading

ferricoxide commented Mar 9, 2021 •

edited

Loading

lorengordon commented Mar 9, 2021

ferricoxide commented Mar 9, 2021

ferricoxide commented Mar 9, 2021 •

edited

Loading

ferricoxide commented Mar 9, 2021

ACTUALLY corrects the behavior in #297 #303

ACTUALLY corrects the behavior in #297 #303

Conversation

ferricoxide commented Mar 4, 2021

lorengordon left a comment • edited Loading

Choose a reason for hiding this comment

ferricoxide commented Mar 4, 2021

ferricoxide commented Mar 9, 2021

ferricoxide commented Mar 9, 2021

lorengordon left a comment • edited Loading

Choose a reason for hiding this comment

ferricoxide commented Mar 9, 2021 • edited Loading

lorengordon commented Mar 9, 2021

ferricoxide commented Mar 9, 2021

ferricoxide commented Mar 9, 2021 • edited Loading

ferricoxide commented Mar 9, 2021

lorengordon left a comment •

edited

Loading

lorengordon left a comment •

edited

Loading

ferricoxide commented Mar 9, 2021 •

edited

Loading

ferricoxide commented Mar 9, 2021 •

edited

Loading