LOG-4928: Cluster logging next APIs by jcantrill · Pull Request #1537 · openshift/enhancements

jcantrill · 2024-01-10T19:30:00Z

This PR proposes the next version of logging APIs

cc @alanconway @xperimental @periklis @cahartma

openshift-ci-robot · 2024-01-10T19:30:04Z

@jcantrill: This pull request references LOG-4928 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This PR proposes the next version of logging APIs

cc @alanconway @xperimental @periklis @cahartma

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

enhancements/cluster-logging/cluster-logging-v2-apis.md

alanconway

API is right, only 1 minor tweak and clarifying text requested.
Main issues to resolve are how best to manage the codebase for 2 APIs, what upgrade paths we want to support and what previews we do.

IMO the best outcome would be:

5.8 supports v1 and v2-beta (tech preview)
6.0 is v2 GA clean drops of v1 and deprecated dependencies.

However aligning with OCP extended support may make this more complicated. To be discussed.

enhancements/cluster-logging/cluster-logging-v2-apis.md

alanconway

Added new comments after todays meeting, if you agree to those then LGTM

enhancements/cluster-logging/cluster-logging-v2-apis.md

periklis

A couple of general notes mostly verifying my understanding of the goals and future proof integration with LokiStack:

One of the goal says: "Support an API to spec a Red Hat managed LokiStack with the logging tenancy model" Is this something still work in progress or missing? If by this goal we express the fact continuing to use the reserved keyword default, I suggest to spare a small section mentioning it's existence and an example CLF dedicated for this purpose. If on the other hand it means we want to expand to user-defined tenancies or augmented tenancy models (e.g. collected logs from OpenStack on OpenShift), we should dedicate some explanation about it.
The section "Implementation Details/Notes/Constraints" mentions that "V2 of the ClusterLogForwarder is a cluster-wide resource". Is this a typo or do we intend to shift back from namespaced ClusterLogForwarder? If latter is true, how does the transition path look like for the use cases covered by namespaced ClusterLogForwarders today?
IIRC we are requested to support in addition to the daemonset setup a deployment based setup for collector. How is this reflected in the serviceAccount and collector specs in the proposed CRD?

enhancements/cluster-logging/cluster-logging-v2-apis.md

jcantrill · 2024-01-18T20:28:35Z

A couple of general notes mostly verifying my understanding of the goals and future proof integration with LokiStack:

One of the goal says: "Support an API to spec a Red Hat managed LokiStack with the logging tenancy model" Is this something still work in progress or missing? If by this goal we express the fact continuing to use the reserved keyword default, I suggest to spare a small section mentioning it's existence and an example CLF dedicated for this purpose. If on the other hand it means we want to expand to user-defined tenancies or augmented tenancy models (e.g. collected logs from OpenStack on OpenShift), we should dedicate some explanation about it.

Maybe still WIP. The intent is to drop the use of 'default' but we still require a mechanism to allow admin to say "write logs to lokiStack that uses the logging tenant model". My expectation is that you can write to a loki instance or a RH lokiStack where the former you define the tenancy and the latter only allows logging tenancy. I welcome suggestions on making this a reality given it appears not fully expressed here.

The section "Implementation Details/Notes/Constraints" mentions that "V2 of the ClusterLogForwarder is a cluster-wide resource". Is this a typo or do we intend to shift back from namespaced ClusterLogForwarder? If latter is true, how does the transition path look like for the use cases covered by namespaced ClusterLogForwarders today?

This is a summation of our previous discussions to make it a true clusterwide resource as the name implies. We are not providing any auto migration but I believe v1 maps fairly easy to v2. I think we can still apply the same permissions requirements that we have now so I don't see it as different. Also planning to deploy to the NS associated with the SA but maybe it would be clearer to move NS out of the SA block and up a level to make it more explicitly where we intend to land the deployment. Lastly this allows us to embrace "namespace logforwarder" if we ever can figure out what that looks like.

IIRC we are requested to support in addition to the daemonset setup a deployment based setup for collector. How is this reflected in the serviceAccount and collector specs in the proposed CRD?

I believe I answer this in the response to the previous question.

dhellmann · 2024-02-13T16:29:25Z

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

alanconway · 2024-02-19T20:40:18Z

@jcantrill re loki vsl. LokiStack outputs - I put this on the backlog centuries ago, and eventually closed it as too far in the future. I have re-opened and updated it to be a bit clearer on what is required: https://issues.redhat.com/browse/LOG-2811

openshift-bot · 2024-03-19T01:16:00Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-03-26T08:45:34Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

jcantrill · 2024-03-26T15:49:50Z

/remove-lifecycle rotten

jcantrill · 2024-03-26T19:56:22Z

The section "Implementation Details/Notes/Constraints" mentions that "V2 of the ClusterLogForwarder is a cluster-wide resource". Is this a typo or do we intend to shift back from namespaced ClusterLogForwarder? If latter is true, how does the transition path look like for the use cases covered by namespaced ClusterLogForwarders today?

This is a summation of our previous discussions to make it a true clusterwide resource as the name implies. We are not providing any auto migration but I believe v1 maps fairly easy to v2. I think we can still apply the same permissions requirements that we have now so I don't see it as different. Also planning to deploy to the NS associated with the SA but maybe it would be clearer to move NS out of the SA block and up a level to make it more explicitly where we intend to land the deployment. Lastly this allows us to embrace "namespace logforwarder" if we ever can figure out what that looks like.

Following up to this statement, it is not possible to change the resource deployment scope between versions. This can only be accomplished by renaming the CRD and group. Continuing as a namespaced resource given we need a namespace. Additionally, the only way to offer two versions of the same API is to provide a migration path. Proposal is being update accordingly

jewzaam · 2024-04-02T19:19:12Z

enhancements/cluster-logging/cluster-logging-v2-apis.md

+
+* "One click" deployment of a full logging stack as provided by **ClusterLogging** v1
+* Complete backwards compatibility to **ClusterLogForwarder** v1
+* Automated migration path from v1 to v2


Are v1 and v2 CRDs supported at the same time or is there a hard cutover required?

IMO, ideally a hard cut over but there are open questions about what it means for users not heeding the deprecation warnings to come to the next version of logging with Elasticsearch and fluentd. Mostly, the v2 api for vector deployments maps in a straight forward fashion from v1

@jewzaam further discussion. Updated the proposal to land on a path that:

initially migrates the resource

drops support for previous version in subsequent release stream

Updated again:

drops CL objects

allows v1 and v2 for vector CLF deployments

migrates v1 to v2

terminates new feature development for v1

jcantrill · 2024-04-18T19:41:13Z

API is right, only 1 minor tweak and clarifying text requested. Main issues to resolve are how best to manage the codebase for 2 APIs, what upgrade paths we want to support and what previews we do.

IMO the best outcome would be:

5.8 supports v1 and v2-beta (tech preview)

6.0 is v2 GA clean drops of v1 and deprecated dependencies.

However aligning with OCP extended support may make this more complicated. To be discussed.

Updated proposal to identify a migration path that supports new and old and introduces deprecation/drops over multiple z-streams. @alanconway please review latest

jcantrill · 2024-04-23T20:22:13Z

/hold cancel

openshift-ci · 2024-05-02T09:18:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: periklis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~enhancements/cluster-logging/OWNERS~~ [periklis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-05-02T09:19:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: periklis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~enhancements/cluster-logging/OWNERS~~ [periklis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jcantrill · 2024-05-07T21:28:05Z

/hold cancel

alanconway

LGTM - this is great. I marked a couple of very minor typos and clarifications, but otherwise this is ready IMO.

alanconway · 2024-05-08T19:35:24Z