Skip to content

Comments

Add Performance Profile Controler must-gather#341

Closed
marioferh wants to merge 4 commits intoopenshift:masterfrom
marioferh:merge_ppc_mustgather_two_stages
Closed

Add Performance Profile Controler must-gather#341
marioferh wants to merge 4 commits intoopenshift:masterfrom
marioferh:merge_ppc_mustgather_two_stages

Conversation

@marioferh
Copy link

@marioferh marioferh commented Dec 9, 2022

Description

Launch Performance Profile Controller data collecting in OCP must-gather

Goal

Remove a standalone PAO must-gather image and make it part of the core OCP must-gather.

Before, the command for the client to run must-gather and PPC must-gather.

oc adm must-gather
--image-stream=openshift/must-gather \
--image=[...]/performance-addon-operator-must-gather-rhel8:$VERSION

Now:
oc adm must-gather --image=openshift/must-gather:$VERSION /usr/bin/gather_performance_profile_controller

Type of change

  • Add PPC scripts
  • Add two stage build to dockerfile in order to get sysinfo binary needed to collect data by scripts
  • Add PPC configuration yaml
  • Tests not included in this repo. Go to NTO-PPC repo.

Needed changes

Change temp ARG REPO=github.com/marioferh/gather-sysinfo to actual repo before merge.

PAO = Performance Addon Operator now is PPC = Performance Profile Controller after this change: openshift/cluster-node-tuning-operator#322

AOS Automation Release Team and others added 4 commits October 2, 2022 12:44
…penshift-4.12-ose-must-gather

Updating ose-must-gather images to be consistent with ART
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 9, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marioferh
Once this PR has been reviewed and has the lgtm label, please assign sferich888 for approval by writing /assign @sferich888 in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marioferh marioferh force-pushed the merge_ppc_mustgather_two_stages branch from f99984e to cdb4c64 Compare December 9, 2022 11:55
Copy link
Contributor

@sferich888 sferich888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great; however there is quite a bit I would like to change or have you consider.

metadata:
name: perf-node-gather-pods-reader
subjects:
- kind: ServiceAccount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this something we provide in its own file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment :S

namespace: perf-node-gather
apiGroup: ""
roleRef:
kind: ClusterRole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this something we are providing int its own file?


sed -i -e "s#MUST_GATHER_IMAGE#$MUST_GATHER_IMAGE#" $DAEMONSET_MANIFEST

oc create -f $NAMESPACE_MANIFEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply all of the oc create commands by using oc apply?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do it with oc create to apply policies after the SA creation:
oc adm policy add-scc-to-user privileged -n $NAMESPACE -z perf-node-gather


sed -i -e "s#MUST_GATHER_IMAGE#$MUST_GATHER_IMAGE#" $DAEMONSET_MANIFEST

oc create -f $NAMESPACE_MANIFEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we 'reuse' the exsisting must-gather namespace; instead of creating a new namespace?

By putting all the collection artifacts in the 'must-gather' namespace; when the oc command that invokes must-gather finishes; a clean up operation is done (removing all of the pods, and other artifacts that we create).

By spawning off new namespaces; it's possible that we could leave around lingering artifacts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am facing this issue if I try to reuse same namespace:
Warning FailedCreate 33s (x15 over 115s) daemonset-controller Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-f2s49" does not allow the workload type management

done
wait "${ADM_PIDS[@]}"

oc delete -f $DAEMONSET_MANIFEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do this; if we rely on the invoking oc command to clean things up for us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for everyone?

@marioferh marioferh force-pushed the merge_ppc_mustgather_two_stages branch from cdb4c64 to e69b638 Compare December 14, 2022 13:12
Copy link
Author

@marioferh marioferh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new change create sysinfo binary inside the repo to not depend of external repo as before

@marioferh marioferh force-pushed the merge_ppc_mustgather_two_stages branch 4 times, most recently from 7f46e5b to 9533b74 Compare December 15, 2022 15:10
name: release
namespace: openshift
tag: rhel-8-release-golang-1.18-openshift-4.12
tag: rhel-8-release-golang-1.19-openshift-4.12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rhel-8-release-golang-1.19-openshift-4.13 would be the correct one here and everywhere else too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Dockerfile.rhel7 Outdated
FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.18-openshift-4.12 AS builder
FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.12 AS builder

RUN dnf install -y git make go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you're doing this, all of those tools are already part of the builder image.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Dockerfile.rhel7 Outdated
COPY ${NODE_GATHER_MANIFESTS_DIR} /etc/performance-profile-node-gather

# the binary follows the must-gather convention of helper tools called gather_$SOMETHING
COPY build/_output/bin/gather-sysinfo /usr/bin/gather_sysinfo No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the lines above, what are they doing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After generate a binary with make we need to copy it to the image

Makefile Outdated
dist-gather-sysinfo: build-output-dir
@if [ ! -x $(TOOLS_BIN_DIR)/gather-sysinfo ]; then\
echo "Building gather-sysinfo helper";\
env CGO_ENABLED=0 GOOS=$(TARGET_GOOS) GOARCH=$(TARGET_GOARCH) go build -ldflags="-s -w" -mod=vendor -o $(TOOLS_BIN_DIR)/gather-sysinfo ./pkg/tools/gather-sysinfo/;\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you're building it? From the email thread it looked like you already have a package ready that you just install?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we have ever said that. We are currently building it as part of the separate must gather image we ship, but there never was an RPM as far as I know.

read desired ready <<< $line
IFS=$'\n'

if [[ "$desired" != "0" ]] && [[ "$ready" == "$desired" ]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @sferich888 here, it's more reasonable to have some wait period before failing.

oc adm policy add-scc-to-user privileged -n $NAMESPACE -z perf-node-gather
oc create -f $CLUSTER_ROLE_MANIFEST
oc create -f $CLUSTER_ROLE_BINDING_MANIFEST
oc create -f $DAEMONSET_MANIFEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't create any of these, b/c must-gather CLI won't be able to clean it up. You should rather re-use the namespace and the privileged SA created for must-gather pod.

go 1.17

require github.com/openshift/build-machinery-go v0.0.0-20210423112049-9415d7ebd33e
require (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like mentioned before, I don't think we want to handle it here, it should be built outside of this repo and only installed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require an RPM in the package repos. The tool is specific to the must gather data collection though. Also we want the tool to be always in sync with the MG and OCP versions. Building it here is actually much much easier and more reliable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I was told it is already built as RPM so that shouldn't be the problem.

@marioferh marioferh force-pushed the merge_ppc_mustgather_two_stages branch from 9533b74 to 6cbcd89 Compare December 16, 2022 11:04
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 16, 2022

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images 6cbcd89 link true /test images
ci/prow/e2e-aws 6cbcd89 link true /test e2e-aws

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@marioferh
Copy link
Author

openshift/oc#1316 oc PR to add required annotations to must-gather annotatoins

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 23, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2023
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sferich888
Copy link
Contributor

/close

@openshift-ci openshift-ci bot closed this Apr 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 19, 2023

@sferich888: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants