Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add proposal watching multiple namespace for KafkaTopic/KafkaUser/KafkaConnector resources #137

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

lenglet-k
Copy link

Hello,

Here's a proposal linked to the problem of creating resources in multiple namespaces. This concerns connectors (KafkaConnector), topics (KafkaTopic) and users (KafkaUsers).

The aim is to identify a solution enabling Strimzi to manage these resources created in different namespaces and thus extend the functionality of the Kafka Strimzi operator.

Linked Issue:

@lenglet-k lenglet-k force-pushed the feat/watching-multiple-namespaces branch from fd75f18 to 9e1bb3f Compare November 22, 2024 15:45
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal. Some general comments:

  • I think this should be split into several proposals - each for the different resource. Among other things, there is a very different demand for it with different resources (and I can imagine for example it not being approved for the connector resource)
  • Each custom resource is very expensive to maintain. So I think you should avoid adding new resources. If you insist on it, you should provide technical details how they would be handled in the code. You also have to explain how to deal with the conflicts of the old and new resources.
  • The design of the Connector operator makes the implementation IMHO especially tricky. So you should provide the details of how would it work.
  • At least for Connectors and Users, there is a huge security hole (which is also why it is important that you don't talk about multitenancy because it is ot, it is just naming mechanism).
    • Connect does not provide any real sandboxing, protection of credentials etc. So by deploying the connectors, you can easily steal the credentials and data. The real "multitenant" way to deploy Connectors is to have your own Connect and Connectors in its own namespace.
    • Users control the ACLs. So any user can grant itself access to any data in the Kafka cluster. Or can break any data in the Kafka cluster by producing poisoned or falsified messages.
    • The problem is not just that this is a security hole that you might have to accept. But also that this is a non-obvious and might mislead users. I think this is a major blocker.
  • Most users do not want to grant Strimzi the rights to watch the whole cluster and to have access to all Secrets etc. So you should probably spent more effort on explaining how the configuration would work, what would need to be provided to the operator, what might not need to be provided (is optional) etc. So explain the configurations, deployment modes, RBAC rights etc. in more detail.
  • The rejected alternative describes how to deal with naming conflicts. I do not understand why did you rejected it. While it might add complexity, it is also something without which it does not work reliably and cannot be supported.


## Motivation

In a multi-tenant Kubernetes cluster, it'as a common practice to define authorization at namespace level or to deploy many application in different namespace. In the case where `Kafka` (Generaly it's case) it's mutualized for many applications and/or for many customers, it's more flexible and secure for deployed each applications in a dedicated namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes is not multi-tenant. At least when it comes to operators. Kafka is also not multitenant. So you should not suggest here that it is. You have to make it clear from the beginning that this is not a multitenancy feature, but only a naming construct.


Create a new `CustomResourceDefinition` to handle topics that are created in namespaces, called `KafkaNamespaceTopic`. This CRD will allow the user to create a topic that will be automatically namespaced within Kafka by the Topic Operator. It will encode the actual topic name using the Kubernetes namespace the resource resides, and either the `metadata.name` or `spec.topicName` (if defined) of the instance of a `KafkaNamespaceTopic`.

`{Kubernetes namespace name}.{metadata.name|spec.topicName}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought ... While for users, prefixing or suffixing makes sense, for Topics, it is way more controversial. Because it means that once you have the topic created in namespace A and have some data in it, you cannot move it to another namespace (if you just create the same topic in another namespace it would get a new name and be empty). Moreover, the Topic Operator already uses the .spec.topicName field and is able to handle conflicts. So the question is whether the prefixing or suffixing here makes really sense and is desired.


> NOTE: Kubernetes namespaces and `metadata.names` cannot use the character `.`

The `KafkaNamespaceTopic` will include all configuration that is available in a `KafkaTopic` and the `strimzi.io/cluster` labels for apply this topic to an existing `Kafka` cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a new CRD copying the content we already have in KafkaTopic. Isn't really a different way of specifying the same maybe using an annotation? Or maybe I haven't got why we need a CRD and an actual example would be beneficial to the proposal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it and reading your feedback, I don't think that adding new CRDs is the right solution. It won't add much more than a specific annotation, as I said in the rejected alternative.

@lenglet-k
Copy link
Author

Thanks for the proposal. Some general comments:

* I think this should be split into several proposals - each for the different resource. Among other things, there is a very different demand for it with different resources (and I can imagine for example it not being approved for the connector resource)

* Each custom resource is very expensive to maintain. So I think you should avoid adding new resources. If you insist on it, you should provide technical details how they would be handled in the code. You also have to explain how to deal with the conflicts of the old and new resources.

* The design of the Connector operator makes the implementation IMHO especially tricky. So you should provide the details of how would it work.

* At least for Connectors and Users, there is a huge security hole (which is also why it is important that you don't talk about multitenancy because it is ot, it is just _naming mechanism_).
  
  * Connect does not provide any real sandboxing, protection of credentials etc. So by deploying the connectors, you can easily steal the credentials and data. The real "multitenant" way to deploy Connectors is to have your own Connect and Connectors in its own namespace.
  * Users control the ACLs. So any user can grant itself access to any data in the Kafka cluster. Or can break any data in the Kafka cluster by producing poisoned or falsified messages.
  * The problem is not just that this is a security hole that you might have to accept. But also that this is a non-obvious and might mislead users. I think this is a major blocker.

* Most users do not want to grant Strimzi the rights to watch the whole cluster and to have access to all Secrets etc. So you should probably spent more effort on explaining how the configuration would work, what would need to be provided to the operator, what might not need to be provided (is optional) etc. So explain the configurations, deployment modes, RBAC rights etc. in more detail.

* The rejected alternative describes how to deal with naming conflicts. I do not understand why did you rejected it. While it might add complexity, it is also something without which it does not work reliably and cannot be supported.

Hello @scholzj, thanks for your answer:

  • I have no problem creating a proposal associated with each resource. So i could do it if necessary.

  • I fully understand the huge security hole with Connector, But perhaps in some cases it's possible to mutualise the KafkaConnect cluster and accept the risk. Of course, this requires a strong warning on this aspect of security.

  • However, I don't understand about the kafkaUser, because if the kafkaUser is configured correctly, it is not authorised to update or modify the ACL. In my case, the kafkaUser resources are only allowed to work in their kafkaTopics, they can't do anything else. Maybe you're talking about a human user in your explanation, if that's the case, the security problem is already there, no, and the (human) user needs to be conscientious and aware of these security elements when creating/modifying kafkaUser.

  • Exact, multi-tenant is not true. I meant multi-client in the sense that several clients are deployed in several namespaces but on a single Kubernetes cluster, and these clients also consume the same Kafka cluster.

  • Ok for rbac configuration, i'm thinking about it.

  • For my rejected alternatives, I thought it went against the CRDs and the operators and that it was more necessary to create new CRDs. I also thought it would add complexity to the code, but probably less than creating new CRDs, which in the end wouldn't add much to what already exists in Strimzi. But from what you've said, I get the impression that this alternative is more relevant, isn't it? Can I revise my copy and explore this alternative further?

@scholzj
Copy link
Member

scholzj commented Nov 29, 2024

I fully understand the huge security hole with Connector, But perhaps in some cases it's possible to mutualise the KafkaConnect cluster and accept the risk. Of course, this requires a strong warning on this aspect of security.

the problem is that there is no place to put the warning in. And if you want manage your own connectors, it has an easy solution - run your own Connect.

However, I don't understand about the kafkaUser, because if the kafkaUser is configured correctly, it is not authorised to update or modify the ACL. In my case, the kafkaUser resources are only allowed to work in their kafkaTopics, they can't do anything else. Maybe you're talking about a human user in your explanation, if that's the case, the security problem is already there, no, and the (human) user needs to be conscientious and aware of these security elements when creating/modifying kafkaUser.

The KafkaUser contains the ACLs. So once you can use KafkaUSer, you can grant yourself ACLs for anything.

@lenglet-k
Copy link
Author

lenglet-k commented Nov 29, 2024

the problem is that there is no place to put the warning in. And if you want manage your own connectors, it has an easy solution - run your own Connect.

Today there's no such warning, but there's nothing to stop us using a single KafkaConnect cluster with connectors using different KafkaUsers. This is a possible use. The real solution is to deploy one KafkaConnect per client.

The KafkaUser contains the ACLs. So once you can use KafkaUSer, you can grant yourself ACLs for anything.

This is already the case today, and the way to remedy it would be to have one Kafka cluster per KafkaUser, which is not a solution.
I don't think I've made the need clear enough. In my case, the KafkaUser is not deployed by our customers, the administrator (me) deploys it through a helm chart, and this KafkaUser is explicitly configured to access only the resources made available to it. We'd do exactly the same thing in a Kafka cluster not managed by strimzi. The aim of this proposal is simply to find the best way of creating and configuring KafkaTopic/KafkaUser/KafkaConnect in a different namespace from that of the Kafka cluster.

@scholzj
Copy link
Member

scholzj commented Nov 29, 2024

Today there's no such warning, but there's nothing to stop us using a single KafkaConnect cluster with connectors using different KafkaUsers. This is a possible use. The real solution is to deploy one KafkaConnect per client.

It is pretty complicated to have a Kafka user per connector. And your proposal does not make it easier in any way (in fact the different namespace of the Connector and Connect resources will make it only harder). Anyway, the main use-case of having a Kafka user per connector would be monitoring or quotas and not security. But the main concern here would be IMHO leaking credentials to the other systems (e.g. databases, storage, etc.).

This is already the case today, and the way to remedy it would be to have one Kafka cluster per KafkaUser, which is not a solution.
I don't think I've made the need clear enough. In my case, the KafkaUser is not deployed by our customers, the administrator (me) deploys it through a helm chart, and this KafkaUser is explicitly configured to access only the resources made available to it. We'd do exactly the same thing in a Kafka cluster not managed by strimzi. The aim of this proposal is simply to find the best way of creating and configuring KafkaTopic/KafkaUser/KafkaConnect in a different namespace from that of the Kafka cluster.

I do not really follow how you do things from how you describe it. But please keep in mind that one of the things the proposal needs to ensure is usability and added value for everyone. And not just you.

I think the general use case for multi-namespace TO and UO (and possibly the Connector operator) is a shared cluster with shared responsibilities. The teams using the different namespaces (e.g. to deploy different services using the same cluster) would get their freedom in deploying their applications. But the way the KafkaUser resources are designed would mean that each of these teams can grant themselves access to anything in the Kafka cluster. That might be acceptable for some users but might also be a huge risk and a blocker for others.

If you have centrally managed Kafka cluster, you typically don't mind the users and topics in a single namespace. You can very easily build for example a GitOps setup to have your users open PRs to create the topics/users and your central team can review it and merge it. The review process gives you the control over what is being done and allows you to enforce whatever policies you want to follow. And GitOps then just applies the resources into the right namespace to create the topics and users. If you have a central team managing he Kafka cluster, I'm not sure I understand why the multi-namespace stuff really matters.

@lenglet-k
Copy link
Author

It is pretty complicated to have a Kafka user per connector. And your proposal does not make it easier in any way (in fact the different namespace of the Connector and Connect resources will make it only harder). Anyway, the main use-case of having a Kafka user per connector would be monitoring or quotas and not security. But the main concern here would be IMHO leaking credentials to the other systems (e.g. databases, storage, etc.).

Today we have a single KafkaConnect cluster, and for each of our customers, we have dedicated connectors with dedicated KafkaUsers who have limited ACLs on their resources. The cluster is closed and secured by filtering rules. I agree, but this doesn't prevent the security concerns we've already mentioned.

I do not really follow how you do things from how you describe it. But please keep in mind that one of the things the proposal needs to ensure is usability and added value for everyone. And not just you.

I'm well aware of this, which is why I'm discussing it with you :) . And I'm not the only one who needs it, since it's already been identified in various issues.

I think the general use case for multi-namespace TO and UO (and possibly the Connector operator) is a shared cluster with shared responsibilities. The teams using the different namespaces (e.g. to deploy different services using the same cluster) would get their freedom in deploying their applications. But the way the KafkaUser resources are designed would mean that each of these teams can grant themselves access to anything in the Kafka cluster. That might be acceptable for some users but might also be a huge risk and a blocker for others.

These are not teams, but client applications, which a single team at our company deploys through a Helm chart per client and ArgoCD. Today, since all resources (KafkaTopics / KafkaUsers / KafkaConnector) must be deployed in a single namespace, we deploy our 200 customers in the same namespace. I'm sure you'll agree that from a management and security point of view this isn't the best solution either, but in the end it's our problem. So in our case, and for the security of the client applications we deploy, we obviously pay close attention to the ACLS of the KafkaUsers, which is why we've developed a Helm chart to define the Acls for each of our client applications.

If you have centrally managed Kafka cluster, you typically don't mind the users and topics in a single namespace. You can very easily build for example a GitOps setup to have your users open PRs to create the topics/users and your central team can review it and merge it. The review process gives you the control over what is being done and allows you to enforce whatever policies you want to follow. And GitOps then just applies the resources into the right namespace to create the topics and users. If you have a central team managing he Kafka cluster, I'm not sure I understand why the multi-namespace stuff really matters.

Multi-namespace is important because it allows us to secure deployed client applications more easily, via network policies per client namespace instead of having network policies per client application in a single namespace. From a management point of view, it's also easier (quota/limit) on Kubernetes namespaces, 200 namespaces but not 200 applications in a single namespace. Even if we were to adapt our gitOps flow, we'd have to create our KafkaUser in the namespace where the KafkaUser cluster is deployed, then wait for the secrets to be created, retrieve their values and copy them into the right namespaces for our clients and configure the associated applications. After that, we need to monitor the modification/deletion of secrets and repeat the previous operation. In my opinion, it's the role of an operator to enable these functionalities, by allowing in our case to configure the KafkaUser resource in the correct Kafka cluster without needing to copy the secrets etc, It doesn't matter which namespace or KafkaUser resource was created, same things for kafkaTopic.

@scholzj
Copy link
Member

scholzj commented Nov 29, 2024

These are not teams, but client applications, which a single team at our company deploys through a Helm chart per client and ArgoCD. Today, since all resources (KafkaTopics / KafkaUsers / KafkaConnector) must be deployed in a single namespace, we deploy our 200 customers in the same namespace. I'm sure you'll agree that from a management and security point of view this isn't the best solution either, but in the end it's our problem. So in our case, and for the security of the client applications we deploy, we obviously pay close attention to the ACLS of the KafkaUsers, which is why we've developed a Helm chart to define the Acls for each of our client applications.

I think you use the terms such as user, client, customer etc. in your own terms without really explaining them, so it is a bit hard to follow what they mean. If you have one team and already use GitOps, then I think the solution I suggested for you fits much better to the general usage patterns. What you describe here, is not what most users talk about when they talk about multi-namespace UO and TO. I agree that in your case if you have a single team that manages many applications in a single namespace the security is not the main concern. But for most other users it most certainly would.

Multi-namespace is important because it allows us to secure deployed client applications more easily, via network policies per client namespace instead of having network policies per client application in a single namespace. From a management point of view, it's also easier (quota/limit) on Kubernetes namespaces, 200 namespaces but not 200 applications in a single namespace. Even if we were to adapt our gitOps flow, we'd have to create our KafkaUser in the namespace where the KafkaUser cluster is deployed, then wait for the secrets to be created, retrieve their values and copy them into the right namespaces for our clients and configure the associated applications. After that, we need to monitor the modification/deletion of secrets and repeat the previous operation. In my opinion, it's the role of an operator to enable these functionalities, by allowing in our case to configure the KafkaUser resource in the correct Kafka cluster without needing to copy the secrets etc, It doesn't matter which namespace or KafkaUser resource was created, same things for kafkaTopic.

You can quite easily use various tools to synchronize the Secrets across namespaces. You can also try the Strimzi Access Operator that was created for this purpose. So I think there are many different ways how you can have your applications deployed across namespaces with lot less effort. I'm also not convinced it really solves the general issues because I think most people want more or less faked multi-tenancy for independent teams.

The Connectors also don't really match this explanation as the connectors are ultimately running in the single namespace of the Connect cluster. So no quotas or network policies really matter for them. They will always be the 200 applications not just in a single namespace but even in a single deployment. So I think even if I assumed the premise of one team managing everything, the connectors do not seem to fit in.

@lenglet-k
Copy link
Author

These are not teams, but client applications, which a single team at our company deploys through a Helm chart per client and ArgoCD. Today, since all resources (KafkaTopics / KafkaUsers / KafkaConnector) must be deployed in a single namespace, we deploy our 200 customers in the same namespace. I'm sure you'll agree that from a management and security point of view this isn't the best solution either, but in the end it's our problem. So in our case, and for the security of the client applications we deploy, we obviously pay close attention to the ACLS of the KafkaUsers, which is why we've developed a Helm chart to define the Acls for each of our client applications.

I think you use the terms such as user, client, customer etc. in your own terms without really explaining them, so it is a bit hard to follow what they mean. If you have one team and already use GitOps, then I think the solution I suggested for you fits much better to the general usage patterns. What you describe here, is not what most users talk about when they talk about multi-namespace UO and TO. I agree that in your case if you have a single team that manages many applications in a single namespace the security is not the main concern. But for most other users it most certainly would.

Multi-namespace is important because it allows us to secure deployed client applications more easily, via network policies per client namespace instead of having network policies per client application in a single namespace. From a management point of view, it's also easier (quota/limit) on Kubernetes namespaces, 200 namespaces but not 200 applications in a single namespace. Even if we were to adapt our gitOps flow, we'd have to create our KafkaUser in the namespace where the KafkaUser cluster is deployed, then wait for the secrets to be created, retrieve their values and copy them into the right namespaces for our clients and configure the associated applications. After that, we need to monitor the modification/deletion of secrets and repeat the previous operation. In my opinion, it's the role of an operator to enable these functionalities, by allowing in our case to configure the KafkaUser resource in the correct Kafka cluster without needing to copy the secrets etc, It doesn't matter which namespace or KafkaUser resource was created, same things for kafkaTopic.

You can quite easily use various tools to synchronize the Secrets across namespaces. You can also try the Strimzi Access Operator that was created for this purpose. So I think there are many different ways how you can have your applications deployed across namespaces with lot less effort. I'm also not convinced it really solves the general issues because I think most people want more or less faked multi-tenancy for independent teams.

The Connectors also don't really match this explanation as the connectors are ultimately running in the single namespace of the Connect cluster. So no quotas or network policies really matter for them. They will always be the 200 applications not just in a single namespace but even in a single deployment. So I think even if I assumed the premise of one team managing everything, the connectors do not seem to fit in.

Sorry if I'm not clear. But my need is exactly the same as specified here:

#6301: declare KafkaTopic in other namespaces than the one where the topic-operator is deployed
#5965: declare KafkaUser in other namespaces than the one where the user-operator is deployed

FYI: client/customer are the same thing, these are dedicated applications (per client/customer) which use the Kafka Cluster. They need KafkaTopics/KAfkaUser and KafkaConnectors to function, themselves dedicated per client.

@scholzj
Copy link
Member

scholzj commented Nov 29, 2024

Sorry if I'm not clear. But my need is exactly the same as specified here:

#6301: declare KafkaTopic in other namespaces than the one where the topic-operator is deployed #5965: declare KafkaUser in other namespaces than the one where the user-operator is deployed

FYI: client/customer are the same thing, these are dedicated applications (per client/customer) which use the Kafka Cluster. They need KafkaTopics/KAfkaUser and KafkaConnectors to function, themselves dedicated per client.

Is your need really exactly the same? I think there are at least two use cases:

  1. The one you seem to have with a single team managing 200 namespaces with different applications, users, and topics.
  2. The other one is where 200 teams want to self-service their topics and users in one shared Kafka cluster across 200 namespaces.

In the first use case, the security concerns likely don't matter much. But for the second one where you want different teams to manage their own stuff, the security issues matter much more. And from my experience and discussions with our users, the second one is what most of them want.

Copy link
Contributor

@PaulRMellor PaulRMellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As can be seen from the issues raised against the current functionality, it's something customers are keen to see implemented, so it's great you're looking into a solution.
I reviewed down to the specification of the CRDs, as a new approach is being considered, and left a few minor suggestions around wording.

088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
088-watching-multiple-namespaces.md Outdated Show resolved Hide resolved
lenglet-k and others added 8 commits December 11, 2024 13:44
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
Co-authored-by: PaulRMellor <[email protected]>
Signed-off-by: lenglet-k <[email protected]>
@lenglet-k
Copy link
Author

Sorry if I'm not clear. But my need is exactly the same as specified here:
#6301: declare KafkaTopic in other namespaces than the one where the topic-operator is deployed #5965: declare KafkaUser in other namespaces than the one where the user-operator is deployed
FYI: client/customer are the same thing, these are dedicated applications (per client/customer) which use the Kafka Cluster. They need KafkaTopics/KAfkaUser and KafkaConnectors to function, themselves dedicated per client.

Is your need really exactly the same? I think there are at least two use cases:

1. The one you seem to have with a single team managing 200 namespaces with different applications, users, and topics.

2. The other one is where 200 teams want to self-service their topics and users in one shared Kafka cluster across 200 namespaces.

In the first use case, the security concerns likely don't matter much. But for the second one where you want different teams to manage their own stuff, the security issues matter much more. And from my experience and discussions with our users, the second one is what most of them want.

OK, thanks for your feedback. What do you think should happen next? As I don't have all your knowledge on this subject, it's hard for me to know what direction to take.

@scholzj
Copy link
Member

scholzj commented Dec 13, 2024

As I don't have all your knowledge on this subject, it's hard for me to know what direction to take.

I'm afraid I do not know what is the right direction. I'm not even sure if this is something Strimzi can address on its level. Sometimes, when some issues are open for years the reason is that there is no easy way to solve them and not just that nobody had time to solve them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants