Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement: check if cluster is up and running #1995

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

barmull
Copy link
Collaborator

@barmull barmull commented Apr 19, 2024

Description

Add condition in ensure_kubeconfig! method to check the cluster's health.

Issues:

Refs: #1968

How has this been tested:

  • Covered by existing integration testing
  • Added integration testing to cover
  • Verified all A/C passes
    • develop
    • master
    • tag/other branch
  • Test environment
    • Shared Packet K8s cluster
    • New Packet K8s cluster
    • Kind cluster
  • Have not tested

Types of changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

Documentation

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • No updates required.

Code Review

  • Does the test handle fatal exceptions, ie. rescue block

Issue

  • Tasks in issue are checked off

@barmull
Copy link
Collaborator Author

barmull commented Apr 19, 2024

For testing I set KUBECONFIG into existing file, but the file is empty. Resulting that KUBECONFIG is set, but cluster is not running. See in this screenshot:

image

Copy link
Collaborator

@agentpoyo agentpoyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely held up by #1964 (review) as the previous message is the output when kubeconfig or connection does not exist:

pair@cnfdev4:~/workspace/drew/testsuite-barmull$ LOG_LEVEL=debug ./cnf-testsuite setup 
CNF TestSuite version: HEAD-2024-04-21-010045-2b70ba7b
Successfully created directories for cnf-testsuite
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: VERSION: HEAD-2024-04-21-010045-2b70ba7b
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v3?: Regex::MatchData("BuildInfo{Version:\"v3.11.1\", GitCommit:\"293b50c65d4d56187cd4e2f390f0ada46b4c4737\", GitTreeState:\"clean\", GoVersion:\"go1.18.10\"" 1:"v3.11.1" 2:"11.")
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: Globally installed helm satisfies required version. Skipping local helm install.
Global helm found. Version: v3.11.1
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v2?: 
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v3?: Regex::MatchData("BuildInfo{Version:\"v3.11.1\", GitCommit:\"293b50c65d4d56187cd4e2f390f0ada46b4c4737\", GitTreeState:\"clean\", GoVersion:\"go1.18.10\"" 1:"v3.11.1" 2:"11.")
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: helm_local_version command: /home/pair/.cnf-testsuite/tools/helm/linux-amd64/helm version
No Local helm version found
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_local_version output: 
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: helm_local_version stderr: --: 1: /home/pair/.cnf-testsuite/tools/helm/linux-amd64/helm: not found

D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v2?: 
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v3?: 
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: helm_v3?: Regex::MatchData("BuildInfo{Version:\"v3.11.1\", GitCommit:\"293b50c65d4d56187cd4e2f390f0ada46b4c4737\", GitTreeState:\"clean\", GoVersion:\"go1.18.10\"" 1:"v3.11.1" 2:"11.")
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: error: Error: Kubernetes cluster unreachable: Get "http://localhost:8080/version": dial tcp [::1]:8080: connect: connection refused

I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: helm_resp (add): 
Global kubectl found. Version: 1.26
Global kubectl client version could not be checked for compatibility with server. (Server not configured?)
No Local kubectl version found
Global git found. Version: 2.34.1
No Local git version found
All prerequisites found.
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: KubectlClient::Create.namespace command: kubectl create namespace cnf-testsuite
Could not create cnf-testsuite namespace on the Kubernetes cluster
D, [2024-04-21 01:00:45 +00:00 #657279] DEBUG -- cnf-testsuite: KubectlClient::Create.namespace output: 
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: KubectlClient::Create.namespace stderr: The connection to the server localhost:8080 was refused - did you specify the right host or port?

I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite-verbose: install_apisnoop
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: apisnoop already installed. Skipping git clone for apisnoop.
Setup complete
I, [2024-04-21 01:00:45 +00:00 #657279]  INFO -- cnf-testsuite: install_kind

@agentpoyo
Copy link
Collaborator

Screenshot_2024-04-20_20-03-42

@barmull
Copy link
Collaborator Author

barmull commented Apr 23, 2024

In this case is not working as expected. Because the change is only in method ensure_kubeconfig!
In this PR #1964 the method ensure_kubeconfig! is added into task create_namespace.
So, if you want to properly tested, you need to have this two PRs together. And then you get the same output as mine above.

@agentpoyo
Copy link
Collaborator

In this case is not working as expected. Because the change is only in method ensure_kubeconfig! In this PR #1964 the method ensure_kubeconfig! is added into task create_namespace. So, if you want to properly tested, you need to have this two PRs together. And then you get the same output as mine above.

I checked out your fork and both changes were included when tested.

@barmull
Copy link
Collaborator Author

barmull commented Apr 24, 2024

Screenshot_2024-04-20_20-03-42

In your output is missing message 'KUBECONFIG is already set.' It seems that method ensure_kubeconfig is not executing. The method should be inserted in "create_namespace" task. So even if you set KUBECONFIG to existing config file, it should catch unhealthy cluster and raise an error. So instead of 'Could not create cnf-testsuite namespace on the Kubernetes cluster' you should see 'Exit code: 256. Executing this command 'kubectl get nodes --kubeconfig=/path/to/config/file' raised an error. For further information run command in CLI.'
image

exit_code = KubectlClient::ShellCmd.run(cmd, "", false)[:status].exit_status
error_message = "Exit code: #{exit_code}. Executing this command '#{cmd}' raised an error. For further information run command in CLI."
if exit_code != 0
puts error_message.colorize(:red)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use stdout_failure instead of puts
  • don't raise
  • do exit(1)

src/tasks/utils/utils.cr Outdated Show resolved Hide resolved
- add condition in ensure_kubeconfig! method to
  check the cluster's health

Signed-off-by: barmull <[email protected]>
Copy link
Collaborator

@martin-mat martin-mat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@martin-mat martin-mat requested a review from kosstennbl June 24, 2024 12:49
Copy link
Contributor

@horecoli horecoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@kosstennbl kosstennbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

@rich-l rich-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@martin-mat martin-mat merged commit 34c1bf4 into cnti-testcatalog:main Jun 25, 2024
2 of 88 checks passed
@barmull barmull deleted the improvement branch July 29, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants