Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke reinstall helper verbose by default #375

Merged
merged 4 commits into from
Feb 12, 2025
Merged

Conversation

pau-hedgehog
Copy link
Contributor

It will only show to the user in case reinstall is verbose indeed

Expect helper has now the verbose support

@pau-hedgehog pau-hedgehog added the ci-hw Run hardware CI job label Feb 6, 2025
@pau-hedgehog pau-hedgehog self-assigned this Feb 6, 2025
Frostman
Frostman previously approved these changes Feb 6, 2025
@Frostman
Copy link
Member

Frostman commented Feb 6, 2025

@pau-hedgehog have you tested it on one of the envs or should we wait for hlab CI to be fixed?

@pau-hedgehog
Copy link
Contributor Author

@pau-hedgehog have you tested it on one of the envs or should we wait for hlab CI to be fixed?

We should wait for hlab. There was no env available to test

@edipascale
Copy link
Contributor

@pau-hedgehog have you tested it on one of the envs or should we wait for hlab CI to be fixed?

We should wait for hlab. There was no env available to test

feel free to grab env-1 to speed up things

It will only show to the user in case reinstall is
verbose indeed

Expect helper has now the verbose support

Signed-off-by: Pau Capdevila <[email protected]>
pau-hedgehog and others added 2 commits February 11, 2025 18:16
Capture hhfab serial ssh error (255) and exit

Fixes #358

Signed-off-by: Pau Capdevila <[email protected]>
which is likely to be the one we hit in the ci

Signed-off-by: Emanuele Di Pascale <[email protected]>
@pau-hedgehog
Copy link
Contributor Author

kudos @edipascale, with your last commit it breaks when any switch fails:

$ ./hhfab-reinstall vlab up -v --collect --ready switch-reinstall --ready inspect
...
11:47:53 DBG s5232-01: 11:47:53 ERR serial: failed to run command: exit status 255                                                                                                                                                   11:47:53 DBG s5248-02: CPGC Memtest Channel 0 ......................11:47:53 ERR serial: failed to run command: exit status 255                                                                                                      
11:47:53 DBG s5232-02: 11:47:53 ERR serial: failed to run command: exit status 255                                                                                                                                                   11:47:53 DBG s5232-01: EXP-ERR: Connection lost during GRUB menu expect                                                                                                                                                              11:47:53 ERR Failed to reinstall switch name=s5232-01 error="exit status 1"                                                                                                                                                          11:47:53 DBG s5232-02: EXP-ERR: Connection lost during GRUB menu expect                                                                                                                                                              11:47:53 DBG s5248-02: EXP-ERR: Connection lost during GRUB menu expect                                                                                                                                                              11:47:53 ERR Failed to reinstall switch name=s5232-02 error="exit status 1"                                                                                                                                                          11:47:53 ERR Failed to reinstall switch name=s5248-02 error="exit status 1" 
...
11:48:54 WRN Failed to reinstall switches err="reinstalling switches: s5232-01: Connection to console failed (error code: 1)\ns5232-02: Connection to console failed (error code: 1)\ns5248-02: Connection to console failed (error code: 1)"
11:49:33 ERR running VLAB: running task: running on-ready commands: reinstalling switches: reinstalling switches: s5232-01: Connection to console failed (error code: 1)
s5232-02: Connection to console failed (error code: 1)
s5248-02: Connection to console failed (error code: 1)

@edipascale
Copy link
Contributor

thanks @pau-hedgehog, it was easier for me to find it because you had already excluded all other paths with your previous commits :D
the question now is, do we want to attempt to handle the connection failure in the expect script? is it worth trying to maybe sleep for x seconds and then attempt to spawn a new serial connection? do we have any idea why we see the serial connection getting closed only some of the time?

@pau-hedgehog
Copy link
Contributor Author

pau-hedgehog commented Feb 12, 2025

thanks @pau-hedgehog, it was easier for me to find it because you had already excluded all other paths with your previous commits :D the question now is, do we want to attempt to handle the connection failure in the expect script? is it worth trying to maybe sleep for x seconds and then attempt to spawn a new serial connection? do we have any idea why we see the serial connection getting closed only some of the time?

Based on previous converstions with @Frostman, the idea is to fail fast in the CI. So no recover I would say. But it's true that we may end up saving time overall if we add retries.

WRT the cause of the SSH errors, I took a look at our console servers but didn't find anything.

As this is part of lab infra maybe we should involve @sonoble so he can advise

@pau-hedgehog
Copy link
Contributor Author

@Frostman,I think this could be merged as the expect helper now detects the error and does not continue to inspect.

We may want to protect further execution in case of failure in other places, checking that the spawn id is open:

17:19:42 DBG s5248-03: send: spawn id exp4 not open                           |                                                                                                                                                      
17:19:42 DBG s5248-03:     while executing                                    |                                                                                                                                                      
17:19:42 DBG s5248-03: "send -- "$KEY_HOME""                                  |                                                                                                                                                      
17:19:42 DBG s5248-03:     invoked from within                                |                                                                                                                                                      
17:19:42 DBG s5248-03: "expect {                                              |                                                                                                                                                      17:19:42 DBG s5248-03:          -ex $ONIE_HIGHLIGHT {                         |
17:19:42 DBG s5248-03:                  log_message "EXP-DBG" "ONIE option found"                                                                                                                                                    17:19:42 DBG s5248-03:                  set timeout -1                        |                                                                                                                                                      17:19:42 DBG s5248-03:                  send "\r"                             |                                                                                                                                                      17:19:42 DBG s5248-03:                  expect -ex "GNU GRUB"                 |                                                                                                                                                      17:19:42 DBG s5248-03:                  send -- "$KEY..."                     |                                                                                                                                                      17:19:42 DBG s5248-03:     invoked from within--------------------------------+                                                                                                                                                      17:19:42 DBG s5248-03: "expect -timeout 300 -ex "GNU GRUB" {                                                                                                                                                                         17:19:42 DBG s5248-03: elog_message "EXP-DBG" "GRUB Menu detected"                                                                                                                                                                   17:19:42 DBG s5248-03: E# Select the ONIE option, and finally the Install OS option                                                                                                                                                  17:19:42 DBG s5248-03: osleep 1..."ted OS, `e' to edit the commands                                                                                                                                                                  17:19:42 DBG s5248-03:     (file "/home/ubuntu/.hhfab-cache/vlabhelpers_reinstall-2662596320/vlabhelpers_reinstall.exp" line 188) 

@Frostman Frostman merged commit 79a92d3 into master Feb 12, 2025
28 checks passed
@Frostman Frostman deleted the reinstall_verbose branch February 12, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-hw Run hardware CI job
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants