Skip to content

Conversation

@rohansonecha
Copy link
Collaborator

This PR adds more debug logging around the list_namespaced_pod call to the k8s api to help diagnose issues where the k8s api returns an empty list of pods. These debug logs can help us determine whether this case is a bug on our end (we are passing in incorrect filtering values) or in the k8s api.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@rohansonecha rohansonecha requested a review from cg505 October 31, 2025 23:07
@rohansonecha rohansonecha self-assigned this Oct 31, 2025
@rohansonecha
Copy link
Collaborator Author

/quicktest-core
/smoke-test --kubernetes

Copy link
Collaborator

@cg505 cg505 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any additional things (e.g. response headers, or request details) we can grab from the response object?

@rohansonecha
Copy link
Collaborator Author

Is there any additional things (e.g. response headers, or request details) we can grab from the response object?

I tried setting _return_http_data_only to False (default is True) and grab the status code and headers. Unfortunately, it seems that this flag makes the k8s api return a very verbose output with a lot of extra information about the pods that we don't want. I'll merge this for now as it will already provide value in debugging cases where pods are missing, and we can look further into getting the status code and headers later on if necessary.

@rohansonecha rohansonecha merged commit f7524cc into master Nov 4, 2025
28 of 32 checks passed
@rohansonecha rohansonecha deleted the query-logs branch November 4, 2025 02:54
Comment on lines +1710 to +1714
logger.debug(
f'Query response for skypilot cluster {cluster_name_on_cloud}: '
f'resource_version={response.metadata.resource_version}, '
f'pod_count={len(pods)}, '
f'continue_token={response.metadata._continue}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just print the entire response, instead of only the part of the members?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want to place this before pods = response.items so that if there is any error getting response.items the logger can print out the message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in this PR: #7852

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants