Logstash add ElasticsearchRefs#6662
Conversation
- fix e2e TestSamples - check ES status - check ES association secrets - check association volume - check association env
…h_esrefs # Conflicts: # pkg/controller/logstash/driver.go # pkg/controller/logstash/pod.go # test/e2e/test/logstash/builder.go
robbavey
left a comment
There was a problem hiding this comment.
This looks fantastic. I do have a couple of broad questions before I dive in fully
| }, | ||
| }, | ||
| }, | ||
| LogstashUserRole: esclient.Role{ |
There was a problem hiding this comment.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
I notice that agent and enterprise search use a superuser role here, presumably to get over the same limitation. I am curious to hear from other folks here as to the best approach.
There was a problem hiding this comment.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.
There was a problem hiding this comment.
I feel like the uncommented/explained use of superuser roles by Agent and Enterprise Search is actually somewhat of a bug. Even if there is a motivation for it in order to create and ingest data into custom indices, it feels like something that should be called out in a comment/doc at the very least and probably a custom role with appropriate privileges to create arbitrary indices would have still been better than superuser.
|
|
||
| // ElasticsearchRef is a reference to an Elasticsearch cluster running in the same Kubernetes cluster. | ||
| // +kubebuilder:validation:Optional | ||
| ElasticsearchRefs []commonv1.ObjectSelector `json:"elasticsearchRefs,omitempty"` |
There was a problem hiding this comment.
Thinking forward a little to when we will have named clusters in Logstash, we talked about introducing the notion of a clusterName into the ElasticsearchRef definition along the lines of
elasticsearchRefs:
- clusterName: k8sData
name: local
namespace: production
with the clusterName being used as the id in the cluster.yaml equivalent that we envisage for Logstash.
If we think about a cluster.yaml looking something like:
elasticsearch_clusters:
k8sData:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
I think we could do this by creating the LogstashSpec to look like:
type LogstashSpec struct {
:
// Named list of Elasticsearch Clusters
ElasticsearchRefs[] ElasticsearchCluster `json:"elasticsearchRefs,omitempty"`
:
}
// Definition of type to define an Elasticsearch Cluster to be referred to in
// Logstash pipeline
type ElasticsearchCluster struct {
commonv1.ObjectSelector `json:",omitempty,inline"`
ClusterName string `json:"clusterName,omitempty"`
}
We could potentially create the env variables with the clusterName? (And if we did would we need the namespace here?)
There is some prior art with the elastic agent using an outputName to distinguish between multiple elasticsearchRefs.
Alternatively, we could stick with what we have here, and write cluster.yaml as:
elasticsearch_clusters:
local_production:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
using $namespace_$name as the id.
WDYT?
There was a problem hiding this comment.
These are awesome suggestions. +1 to follow the prior art of agent to unify the experience. Also, +1 to bring cluster name in technical preview.
I think we can use outputName instead of clusterName to further share the experience. For the env variable, let's replace $namespace_$name with outputName. The name needs to be unique in elasticsearchRefs. I will add validation.
There was a problem hiding this comment.
My thought on using clusterName over outputName is that we also use want to use elasticsearchRef in input and filter plugins, not just for outputs
There was a problem hiding this comment.
@robbavey Revisiting the usage of clusterName. I have doubts about its value. One drawback is the stack monitoring should understand clusterName and send metrics to the corresponding cluster, which does not feasible now because the operator only considers namespace + name in the association.
From the design doc, things we want to achieve (now and future cluster.yml) with clusterName seem can be replaced by namespace + name.
elasticsearch {
cluster_id => namespace_name
}
elasticsearch_clusters:
namespace_name:
host: "https://metaldata-es-http:9200"
username: “logstash_system
password: REDACTED
ca_file: /mnt/elastic-internal/es-certs/ca.crt
namespace_name2:
cloud_id: XXXXXXX
api_key: yyyyyy
clusterName works like a nickname that is handy when user wants to give a second name.
I wanna keep the current env variable naming convention, namespace + name, at least in the technical preview.
There was a problem hiding this comment.
I tend to agree with @barkbay that the lack of clusterName in elasticsearchRef in stack monitoring is probably ok - clusterName is added as an aid for the user to refer to an elasticsearch cluster pointed to by elasticsearchRef outside of the kubernetes CRD in pipeline configurations, without needing to deal with kubernetes semantics, such as namespaces. The ability to define/move namespaces outside of the CRD is an an interesting twist on this.
There was a problem hiding this comment.
Could this be an optional feature? If users define it it will be used to name the environment variables? Also could it then just be named alias?
elasticsearchRefs:
- alias: o11y-cluster
name: monitor
namespace: production I think I also agree that stack monitoring is different enough to allow a special case here. Because users have to understand the relationship between the elasticsearchRef and what they define in configuration there is a motivation for such a construct. Why would one want to use an alias for stack monitoring?
There was a problem hiding this comment.
In Logstash context, clusterName is a mapping of name to endpoint (host, username, password, cacert). The endpoint can be an input (source) or an output (destination). It will be referenced in pipeline.
input {
kafka { }
}
output {
elasticsearch { cluster_name => "production" }
elasticsearch { cluster_name => "uat" }
}
In Logstash CRD, we allow users to define the name mapping
elasticsearchRefs:
- clusterName: production
name: es1
- clusterName: uat
name:es2
- clusterName: monitor
name:es3
However, Stack Monitoring does not understand the name production, uat and monitor. If Stack Monitoring wants to insert metrics to monitor, it needs the following
monitoring:
metrics:
elasticsearchRefs:
- name: es3
As a user, I am confused why I can't use monitor that I declared earlier. The name works in pipeline but not in Stack Monitoring.
IMO, ultimately we will need clusterName to achieve cluster.yml which contains the mapping of alias to endpoint. In technical preview, it seems not ready. I hope the example can explain the confusion.
There was a problem hiding this comment.
I guess my confusion here, is what would be your expectation of using clusterName in the monitoring section?
Are you expecting to be able to use clusterName instead of name + namespace? Or as well as? Is your expectation that you could use
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
or that you would expect
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
name:es3
or that clusterName is arbitrary without the specific cluster or clusterName setting in cluster.yml when that comes to be?
I agree that it feels a little half baked without the cluster.yml, and we only refer to clusterName in the env variable, but I do feel that it will have utility when we start allowing named clusters, and maybe introducing the notion earlier is of some value.
Thoughts @flexitrev, @jsvd, @roaksoax?
There was a problem hiding this comment.
my expectation is to be able to use clusterName to output metrics to monitor
monitoring:
metrics:
elasticsearchRefs:
- clusterName: monitor
and won't stop user to use namespace + name to output
monitoring:
metrics:
elasticsearchRefs:
- name: es3
The declaration in spec.elasticsearchRefs will all convert to cluster.yml. Only the clusterName in cluster.yml can be referenced in pipeline.
elasticsearchRefs:
- clusterName: production
name: es1
- clusterName: uat
name:es2
- clusterName: monitor
name:es3
cluster.yml
- production:
host:...
username:...
password:...
cacert:...
- uat:
host:...
username:...
password:...
cacert:...
- monitor:
host:...
username:...
password:...
cacert:...
…h_esrefs # Conflicts: # pkg/controller/logstash/pod.go
|
@barkbay I think this is ready for review There is a discussion about bringing |
| { | ||
| Names: []string{"logstash-*", "ecs-logstash-*", "logs-*", "metrics-*", "synthetics-*", "traces-*"}, | ||
| Privileges: []string{"manage", "read", "create_doc", "view_index_metadata", "create_index"}, | ||
| }, |
There was a problem hiding this comment.
I think we need more permissions here, in terms of both index names, and privileges.
I tried a few different simple scenarios, and there were a few issues -
Removing data_streams => false will write to logstash, and ecs_compatibility => disabled will write to ecs-logstash by default, so we will need to add those to the list of Names
Additionally, I think we need to add additional privileges - we get a
[indices:data/write/index:op_type/index] is unauthorized for user [default-logstash-sample-default-elasticsearch-logstash-user] with effective roles [eck_logstash_user_role] on indices [ecs-logstash], this action is granted by the index privileges [create,index,write,all]"
error when attempting to write to this index. I suspect we will need the write privilege here - the elasticsearch output is able to do create, index, update and delete actions
If we are going to limit the indices here, we will need to add instructions on how users can get around this
There was a problem hiding this comment.
Thanks for double check it. I have updated the setting.
ecs-logstash and logstash are required by ilm_rollover_alias
The logstash doc regarding secure connection needs update
There was a problem hiding this comment.
Thank you! This now works as expected.
robbavey
left a comment
There was a problem hiding this comment.
This looks good to me from a logstash perspective.
My only concern is around the experience when referring to an index that is not one of the "default" ones, and how we choose to document this.
pkg/controller/logstash/env_test.go
Outdated
| AuthSecretName: "logstash-sample-default-elasticsearch-sample-logstash-user", | ||
| AuthSecretKey: "default-logstash-sample-default-elasticsearch-sample-logstash-user", | ||
| CACertProvided: false, | ||
| CASecretName: "logstash-sample-logstash-es-default-elasticsearch-sample-ca", |
There was a problem hiding this comment.
Would we expect this to be created if we are not using tls?
Sure, I'll do a first review today. |
barkbay
left a comment
There was a problem hiding this comment.
I left a few comments, I still have to do some tests.
pkg/controller/logstash/env.go
Outdated
|
|
||
| func getEsRefNamespacedName(assoc commonv1.Association) string { | ||
| ref := assoc.AssociationRef() | ||
| return fmt.Sprintf("%s_%s", ref.Namespace, ref.Name) |
There was a problem hiding this comment.
ref.Name may be empty if user provided connection settings through a Secret:
apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
name: logstash-sample
spec:
count: 1
version: 8.7.0
elasticsearchRefs:
- secretName: external-cloud-es-ref
pipelines: {}
---
apiVersion: v1
kind: Secret
metadata:
name: external-cloud-es-ref
stringData:
url: https://my-deployment.es.europe-west1.gcp.cloud.es.io
username: elastic
password: REDACTEDThis feature, introduced in ECK 2.2.0 is not documented (yet: #6449) but can be used by a user to connect a stack component to an [external/non ECK managed] Elasticsearch cluster.
I'm not sure what is the best way to handle this case if we want to inject connection settings using env. variables.
There was a problem hiding this comment.
Great that you raise the issue. We can address the external ES connection in a follow-up. I am thinking of using ref.NameOrSecretName() for env. In the future, the operator will need to write managed and unmanaged cluster info to a cluster.yml. Logstash will need to map the cluster.yml to plugin config.
In the technical preview, we aim for managed clusters.
There was a problem hiding this comment.
add an issue for followup https://github.com/elastic/ingest-dev/issues/1764
There was a problem hiding this comment.
Should we make it a public issue so our users can find it?
| }, | ||
| }, | ||
| }, | ||
| LogstashUserRole: esclient.Role{ |
There was a problem hiding this comment.
The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.
That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
…to logstash_esrefs # Conflicts: # pkg/controller/logstash/pod.go
- change env variable prefix - add validation
robbavey
left a comment
There was a problem hiding this comment.
This is great - really close, I have a couple of nits on clarifying the messaging on missing clusterName
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
|
@kaisecheng One thing I noticed, that I should have noticed earlier 🤦 - we use |
|
@robbavey I have changed the suffix to |
|
👀 |
| // ElasticsearchCluster definition of type to define an Elasticsearch Cluster to be referred to in Logstash pipeline | ||
| type ElasticsearchCluster struct { | ||
| commonv1.ObjectSelector `json:",omitempty,inline"` | ||
| ClusterName string `json:"clusterName,omitempty"` |
There was a problem hiding this comment.
Do we want to prevent user from using a same clusterName for different clusters with a new validation?
There was a problem hiding this comment.
would like to improve it in future
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
This PR allows configuration of
elasticsearchRefsin CRD to associate Elasticsearch clusters to Logstash. The connectivity info/ credentials are passed via environment variable to Logstash container for referencing in Logstash plugins: logstash-output-elasticsearch, logstash-input-elasticsearch and logstash-filter-elasticsearch.Sample config
Environment variables taking
$clusterName_ES_as prefix are available in Logstash containerPRODUCTION_ES_HOSTS PRODUCTION_ES_USERNAME PRODUCTION_ES_PASSWORD PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY # path of ca.crt of production clusterSample Logstash pipeline with elasticsearch output using environment variables
The sample logstash yml file as located in config/samples/logstash/logstash_es.yaml will
create:
e2e test
Still todo:
Co-authored-by: Rob Bavey rob.bavey@elastic.co