Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump GKE windows test logs via diagnostics tool #83517

Merged
merged 1 commit into from
Nov 7, 2019

Conversation

YangLu1031
Copy link
Contributor

@YangLu1031 YangLu1031 commented Oct 4, 2019

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
SSH on GKE windows node hasn't been enabled yet, so not able to get the archived logs after prow test. We have to use stackdriver to check logs which is way chunkier.
Use diagnostics tool dumping windows logs to mitigate the pain.

Special notes for your reviewer:
The diagnostics tool repo: https://github.com/GoogleCloudPlatform/compute-image-tools/tree/master/cli_tools/diagnostics

Does this PR introduce a user-facing change?:

Utilize diagnostics tool to dump GKE windows test logs 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 4, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @YangLu1031. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 4, 2019
@YangLu1031
Copy link
Contributor Author

/cc @pjh

@k8s-ci-robot k8s-ci-robot requested a review from pjh October 4, 2019 23:28
@@ -229,31 +229,23 @@ function save-logs-windows() {
echo "Not saving logs for ${node}, Windows log dumping requires gcloud support"
return
fi

export-windows-docker-event-log "${node}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly what this function did but I guess it made the Docker event log available in a file somewhere. Do we still need to call this, or does the diagnostics command do the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we cannot call it, as it runs powershell cmd over SSH, and I left the function definition there in case we change back to SSH approach later. Diagnostics cmd collected all the event logs including docker event in evtx files, but not an explicit docker.log file. We could do the same in diagnostics cmd, if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can one read the evtx files? Is it possible to open them on a Linux workstation?

If we need to transform them somehow perhaps we should do this after unzipping the archive file at L246.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there're some tools like python-evtx, evtViewer etc. But still not user-friendly, looks we need to improve diagnostics to dump plain text logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If there's something easy to do then feel free to add it to this PR, otherwise we can follow-up with changes here or in the diagnostics tool itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried using python-evtx tool on sample evtx file, looks like it converts evtx to xml, still not human-readable. evtViewer seems not working. Thinking to change it in the diagnostics tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds like probably the best approach. Consider filing a FR issue in https://github.com/GoogleCloudPlatform/compute-image-tools/ to track this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster/log-dump/log-dump.sh Outdated Show resolved Hide resolved
cluster/log-dump/log-dump.sh Show resolved Hide resolved
@pjh
Copy link
Contributor

pjh commented Oct 7, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 7, 2019
@YangLu1031
Copy link
Contributor Author

@pjh Can you take another look? Thanks :)

@YangLu1031
Copy link
Contributor Author

/retest

@pjh
Copy link
Contributor

pjh commented Oct 9, 2019

Pretty much LGTM but we still have test failures due to the service account issue; discussed in person, waiting for an update on that.

@pjh
Copy link
Contributor

pjh commented Oct 9, 2019

Pretty much LGTM but we still have test failures due to the service account issue; discussed in person, waiting for an update on that.

Waiting for some guidance on kubernetes/test-infra#14670 before deciding what to do here.

@YangLu1031
Copy link
Contributor Author

@pjh Added a switch if it's GKE cluster, using diagnostics tool, otherwise using SSH to dump windows logs. Please take a look.

cluster/log-dump/log-dump.sh Outdated Show resolved Hide resolved
cluster/log-dump/log-dump.sh Outdated Show resolved Hide resolved
@YangLu1031
Copy link
Contributor Author

@pjh please take another look :)

@@ -238,6 +238,33 @@ function save-windows-logs-via-diagnostics-tool() {
fi
}

# Saves log files from SSH
function save-windows-logs-via-ssh() {
export-windows-docker-event-log "${node}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does node need to be passed as an arg?

Please do a test run with the updated code if possible to make sure there are no errors :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for dest_dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, tested, it works fine. Looks like bash have dynamic scoping, the inner function have visibility to local variables of outer function.
But should pass as an arg for clarity and reliability. Will fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Bash is the worst.

@YangLu1031 YangLu1031 force-pushed the master branch 2 times, most recently from e1ceab6 to efede24 Compare October 30, 2019 23:28
@pjh
Copy link
Contributor

pjh commented Oct 30, 2019

/lgtm

Please squash the commits into one.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2019
@pjh
Copy link
Contributor

pjh commented Oct 30, 2019

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 30, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2019
@pjh
Copy link
Contributor

pjh commented Oct 31, 2019

/lgtm

@zmerlynn @eparis could you please take a look? Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 31, 2019
@pjh
Copy link
Contributor

pjh commented Nov 6, 2019

Hi @zmerlynn @eparis , could you please take a look or assign the appropriate person? Thanks.

@zmerlynn
Copy link
Member

zmerlynn commented Nov 6, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: YangLu1031, zmerlynn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2019
@liggitt
Copy link
Member

liggitt commented Nov 6, 2019

/test pull-kubernetes-dependencies

managing the queue around the go1.13 bump, see https://groups.google.com/forum/#!topic/kubernetes-dev/Yyka3G2ebXE

@liggitt
Copy link
Member

liggitt commented Nov 7, 2019

/retest

@k8s-ci-robot k8s-ci-robot merged commit 569a45f into kubernetes:master Nov 7, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants