-
Notifications
You must be signed in to change notification settings - Fork 462
Bug 1708602: pkg/daemon: workaround old util-linux logger in rhel7.6 #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1708602: pkg/daemon: workaround old util-linux logger in rhel7.6 #734
Conversation
|
/hold Depends on #733 |
|
/test e2e-rhel-scaleup |
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eeek. Scraping all journal messages for that?
How about we detect on startup whether the host is rhel7, and add that as a flag in the daemon struct. That'd be useful for a lot of other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we scrape all journal only if --journald isn't supported so we need to do this anyway right? I didn't want to add a specific rhel7 flag as I was just testing for functionality which should be generally better. If rhel7.7 upgrades util linux to a logger that does have --journald, we'd be still using this awful ack if we just test RHEL versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rhel7.7 upgrades util linux to a logger that does have --journald, we'd be still using this awful ack if we just test RHEL versions
Even if it did upgrade which...seems very unlikely (right?) we'd still have an upgrade hazard where for previous versions we'd write one way and then later the other way. Seems better to just always live with one format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eeek. Scraping all journal messages for that?
to reply specifically on that. With this patch, we're not always scraping all journal, if !loggerWithoutJournalOption() { takes care of telling us if we need to do that in case logger doesn't have --journald.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it did upgrade which...seems very unlikely (right?) we'd still have an upgrade hazard where for previous versions we'd write one way and then later the other way. Seems better to just always live with one format.
Yeah, agree it's very unlikely indeed, but since we don't need to play with a rhel7 flag today, isn't testing for the logger functionality in itself enough? (I'm super happy to change it anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OLD_LOGGER seems a bit too generic...while it's unlikely, that string could appear in log messages from any othe process.
(Ideally...we'd have the MCD logging under machine-config-daemon.service that we create on the host. That's another topic)
How about calling it OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK ? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling it
OPENSHIFT_MACHINE_CONFIG_DAEMON_LEGACY_LOG_HACK?
😂 I'll go for it, that's a good one indeed to avoid collisions
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then in this code we wouldn't try and fall back each time, we'd just do:
if (!dn.hostRhel7) {
logger --journald
} else {
logger with literal JSON as message
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #734 (comment)
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use logger --help...as running this command just hangs for me; maybe it works because we're not providing stdin?
Or better, do:
if dn.OperatingSystem == machineConfigDaemonOSRHCOS {
return false
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm, does it hang on RHCOS? works fine on rhel7.6 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah I get it now thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note double negative here, probably better as loggerSupportsJournal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
69f2122 to
c37d648
Compare
|
/test e2e-rhel-scaleup |
|
/retest |
|
/test e2e-rhel-scaleup |
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this. 😆
ashcrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the logger shell out code should probably end up being encapsulated in it's own struct/func, but for now this seems sane to fix the bug.
|
@vrutkovs just for awareness, not sure who the owner is, but scaleup is broken somehow EDIT: the last failure there seem to be: |
|
/test e2e-rhel-scaleup |
|
/retest |
|
/refresh |
|
/test e2e-rhel-scaleup |
|
aws limits.... will retest later... |
|
/retest |
|
/hold cancel |
pkg/daemon/update.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it'd be cleaner to call loggerSupportsJournal once or so on MCD startup rather than trying and failing each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I can definitely do that if it's for loggerSupportsJournal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed
|
/approve |
Signed-off-by: Antonio Murdaca <runcom@linux.com>
Signed-off-by: Antonio Murdaca <runcom@linux.com>
Signed-off-by: Antonio Murdaca <runcom@linux.com>
|
/test e2e-rhel-scaleup |
|
ok, this is now stable in my testing against a real scaleup cluster with 3 rhcos masters + 2 centos 7 workers, the fallback to still write on disk made the trick and I'd leave it as a redundancy fallback. We should run our e2e-aws-op in scaleup job though or we risk these kinds of regressions everytime. What bothers me the most though is:
Also, see openshift/release#3748 (comment) Follow up from my points above: CentOS7 in scaleup job uses cri-o 1.12.x (wrong and old since kubelet is at 1.13) |
|
/retest The cluster and machineconfig operator go up in the scale up test but the job itself fails. This is not related to this change as it wasn't passing even before this (reason why it was never enabled by default) |
|
/test e2e-rhel-scaleup |
1 similar comment
|
/test e2e-rhel-scaleup |
|
scaleup passed this time.. |
|
Let's strike while e2e is succeeding. @runcom is this ready for final review/merge? |
it is, if @cgwalters @kikisdeliveryservice @LorbusChris can ack on this it would be ready to merge |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, LorbusChris, runcom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-rhel-scaleup |
|
😢 /test e2e-rhel-scaleup |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@runcom: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
- What I did
RHEL7.6 has an old util-linux's logger which doesn't support the
--journaldflag to log to journal in a structured way. Work that around by logging a json directly and fall back to that if we're on rhel7.6 then.- How to verify it
- Description for the changelog