Elastic Exporter: Init publishing to Elasticsearch #2198

urso · 2021-01-27T01:11:29Z

Description:

This is the first step in modifying the Elastic exporter, such that it can publish to APM Server or Elasticsearch. For now only the Logs exporter interface will support publishing to Elasticsearch.

It is not possible yet to create an instance of the Elastic Logs exporter using the factory. We will do the final hook up when the exporter is completed and we have tests in place. This change does not yet implement the event transformation and publishing.

The elasticsearch exporter is based on the official go-elasticsearch client. We will use the BulkIndexer provided by the client for event publishing. The client and BulkIndexer provide some support for retrying already. The Elasticsearch Bulk API can report errors at the HTTP level, but uses selective ACKs for individual events. This allows us to retry only failed events and/or reject events that can not be indexed (e.g. due to an mapping error). The 429 error code might even inidcate that we should backoff a little before retrying.

In order to prepare for the change I split the original exporter.go into exporter.go, apmserverexporter.go, and elasticsearchexporter.go. The exporter.go will decide which of the concrete exporters shall be used in the future.

Besides the "split" of exporter.go, most of the changes are related to configuration and elasticsearch.Client setup.

Link to tracking Issue: #1800

Testing: The config loading tests have been modified to check that the new settings can be read.

Documentation: None so far. Documentation for the new settings (see config.go) will be provided once
it is possible for users to configure the elasticsearch exporter.

This is the first step in modifying the Elastic exporter, such that it can publish to APM Server or Elasticsearch. For now only the Logs exporter interface will support publishing to Elasticsearch. It is not possible yet to create an instance of the Elastic Logs exporter using the factory. We will do the final hook up when the exporter is completed and we have tests in place. This change does not yet implement the event transformation and publishing. Documentation for the new settings (see config.go) will be provided once it is possible for users to configure the elasticsearch exporter. The elasticsearch exporter is based on the official [go-elasticsearch](https://github.com/elastic/go-elasticsearch) client. We will use the BulkIndexer provided by the client for event publishing. The client and BulkIndexer provide some support for retrying already. The Elasticsearch Bulk API can report errors at the HTTP level, but uses selective ACKs for individual events. This allows us to retry only failed events and/or reject events that can not be indexed (e.g. due to an mapping error). The 429 error code might even inidcate that we should backoff a little before retrying. In order to prepare for the change I split the original exporter.go into exporter.go, apmserverexporter.go, and elasticsearchexporter.go. The exporter.go will decide which of the concrete exporters shall be used in the future. Besides the "split" of exporter.go, most of the changes are related to configuration and elasticsearch.Client setup.

linux-foundation-easycla · 2021-01-27T01:11:32Z

❌ - Steffen Siering The commit (5caaef4 ,163000d3df5d5d74d149fb28a80fddd6da088505) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

codecov · 2021-01-27T01:29:00Z

Codecov Report

Merging #2198 (5caaef4) into main (e2af59e) will decrease coverage by 0.41%.
The diff coverage is 41.98%.

@@            Coverage Diff             @@
##             main    #2198      +/-   ##
==========================================
- Coverage   90.49%   90.07%   -0.42%     
==========================================
  Files         397      399       +2     
  Lines       19577    19671      +94     
==========================================
+ Hits        17716    17719       +3     
- Misses       1401     1490      +89     
- Partials      460      462       +2

Flag	Coverage Δ
integration	`69.31% <ø> (ø)`
unit	`88.87% <41.98%> (-0.42%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
exporter/elasticexporter/elasticsearchexporter.go	`0.00% <0.00%> (ø)`
exporter/elasticexporter/exporter.go	`68.42% <18.18%> (-7.09%)`	⬇️
exporter/elasticexporter/config.go	`60.71% <56.00%> (-39.29%)`	⬇️
exporter/elasticexporter/apmserverexporter.go	`68.57% <68.57%> (ø)`
exporter/elasticexporter/factory.go	`92.59% <85.71%> (-7.41%)`	⬇️
receiver/prometheusexecreceiver/receiver.go	`85.83% <0.00%> (-2.50%)`	⬇️
exporter/stackdriverexporter/stackdriver.go	`75.75% <0.00%> (-2.28%)`	⬇️
processor/groupbytraceprocessor/event.go	`95.96% <0.00%> (-0.81%)`	⬇️
exporter/datadogexporter/translate_traces.go	`87.67% <0.00%> (-0.77%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2af59e...163000d. Read the comment docs.

anuraaga

Left a high level comment on the package structure. Generally looks good otherwise, though would be good if we can address some of the codecov comments.

anuraaga · 2021-01-27T03:29:51Z

exporter/elasticexporter/apmserverexporter.go

+	var errs []error
+	var count int
+	instrumentationLibrarySpansSlice := rs.InstrumentationLibrarySpans()
+	for i := 0; i < instrumentationLibrarySpansSlice.Len(); i++ {


Perhaps this loop could be moved into the translator package too.

anuraaga · 2021-01-27T03:30:16Z

exporter/elasticexporter/apmserverexporter.go

+	e.logger.Debug("sending events", zap.ByteString("events", w.Bytes()))
+
+	var buf bytes.Buffer
+	zw, err := zlib.NewWriterLevel(&buf, zlib.DefaultCompression)


Ok for compression to not be configurable?

anuraaga · 2021-01-27T03:33:54Z

exporter/elasticexporter/config.go

 type Config struct {
 	configmodels.ExporterSettings `mapstructure:",squash"`
 	configtls.TLSClientSetting    `mapstructure:",squash"`

+	APMServerConfig     `mapstructure:",squash"`
+	ElasticsearchConfig `mapstructure:",squash"`


I'm a bit confused by having both apm server and elasticsearch here. It seems like they are quite different APIs - since we already have translation in an internal shared package anyways, does it make sense to split into two packages, elasticapmexporter and elasticsearchexporter?

@anuraaga this decision to have one single exporter whose implementation pushes data to two different endpoints is the result of a conversation with @tigrannajaryan (see #1630 (comment) ).
We changed our point of view and subscribe to Tigran's point of view, we feel this will be more straightforward for users.

The "one exporter for multiple signals" is a recommendation. That's what OTLP exporter does. In many cases it results in less config settings for the end user to specify (e.g. you likely need to specify access tokens and other credentials, so you do it once, in one exporter config section).
It is not a hard rule, feel free to deviate if you feel there is not much common between the exporters of different signals.

multiple signals

My understanding of this is that this isn't quite multiple signals but two different backends, elasticsearch and elastic apm. While logs are here only exported to elasticsearch I guess we can assume elastic apm would also ingest them at some point? And maybe traces would be exported to Elasticsearch similar to Zipkin's support.

So if it's two backends I guess it's natural to have to exporter packages.

Thanks for the feedback @tigrannajaryan and @anuraaga .
We liked Tigran's suggestion and we got positive feedback from users on the UX we would have, it was perceived as pretty intuitive.

So if it's two backends I guess it's natural to have to exporter packages.

We would like to get the feedback of few users on the User Experience with two exporters, we should be able to share our findings very soon on this thread.
In parallel, we continue the implementation.

Hello, sorry for the delay. We have confirmed with a bunch of users that the user experience is intuitive enough with a split of exporters, introducing an elasticsearch exporter.

We will move forward creating a PR for the new Elasticsearchexporter and we will close this PR. Does it make sense @anuraaga and @tigrannajaryan ?

If you feel that works best for you then it works for me.

Thank you @tigrannajaryan

Signed-off-by: Bogdan Drutu <[email protected]>

github-actions · 2021-02-06T05:25:53Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

jpkrohling · 2021-02-09T11:00:55Z

I have a conceptual question about this: we have avoided connecting the collector to concrete storage mechanisms, as we deemed this as to be the responsibility of the "final destination" (Jaeger, Zipkin, SaaS vendors, ...). Has this changed? Are we accepting storage mechanisms directly under the contrib? This might change a few things for us downstream at Jaeger.

cc @albertteoh, @kevinearls, @yurishkuro

cyrille-leclerc · 2021-02-09T13:26:31Z

@jpkrohling I'm wondering if there is here the difference between

Elasticsearch being a logs & metrics solutions
Elasticsearch being the storage used by some distributed traces solutions

The scope of the Elasticsearch exporter we are discussing at the moment is "Elasticsearch as a logs & metrics solution". We didn't consider and didn't plan to consider to consider the other dimension, "Elasticsearch being the storage used by some distributed traces solutions".

Do this separation of concern make sense?

jpkrohling · 2021-02-09T15:31:47Z

I understand that Elasticsearch isn't a logs and metrics solution, but rather, a data store:

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.

https://www.elastic.co/elasticsearch/

Independent from the signal, I believe that the question remains. Will this repository here accept, say, a postgresqlexporter?

cyrille-leclerc · 2021-02-09T17:20:30Z

I understand that Elasticsearch isn't a logs and metrics solution, but rather, a data store:

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases.
https://www.elastic.co/elasticsearch/

I meant Elasticsearch combined with Kibana. This stack is broadly used as a logs and metrics solution.

Anyway, regarding your question on the addition of more exporters, I was thinking more of Timescale than a PostgreSQL exporter, I'll defer to the maintainers of the OpenTelemetry Collector.

I personally see a lot of value in these contrib exporters that help users adopt OpenTelemetry without waiting for the different Observability solutions to provide native support for the OTLP protocol.

yurishkuro · 2021-02-09T18:24:31Z

I think this should be renamed to "Elastic APM Exporter". That removes the confusion that it exports to "raw storage": it does not, it uses trace format that is ONLY understood by Elastic APM. Jaeger also has "ES exporter" but with a different span format, and we don't want to have to answer questions in the future like why this "elasticsearch exporter" does not work with Jaeger.

yurishkuro · 2021-02-09T19:59:00Z

exporter/elasticexporter/apmserverexporter.go

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package elasticexporter


Suggested change

package elasticexporter

package elasticapmexporter

As I have explained in #2198 (comment) , please note that

The elasticexporter has been here for 8 months and relates to Elastic as a company and to the Elastic Observability solution

We have not seen any misunderstanding of people who use a tracing solution that store its traces in Elasticsearch (e.g. Jaeger or Zipkin) and who would expect the Elastic Exporter to be what they need. We will be happy to update the Elastic exporter docs if we see confusion.

cyrille-leclerc · 2021-02-09T21:44:30Z

@yurishkuro I didn't catch the potential risk of confusion of Jaeger or Zipkin users trying to use an Elasticsearch exporter to visualise their traces.
Please note that the elasticexporter has been around for 8 months and we have not seen any misunderstanding, I think that the documentation of the Elastic exporter is clear enough on the fact that it relates to Elastic as a company and to the Elastic Observability solution.

As we have said above, we are happy to close this Pull Request and to discuss the question of an exporter of logs to Elasticsearch in a separate thread. I'm sure that we can find a solution to prevent risks of misunderstandings for people who use a tracing solution that stores its traces in Elasticsearch (e.g. Jaeger or Zipkin).
Would this approach work for you?

yurishkuro · 2021-02-09T22:02:13Z

You can see the confusion already in the question from @jpkrohling about whether contrib exporters are a place for storage exporters.

Please note that the elasticexporter has been around for 8 months and we have not seen any misunderstanding

OpenTelemetry is not widely used yet, so this is not indicative.

I think that the documentation of the Elastic exporter is clear enough

Mostly yes, I would still make a couple of small tweaks to clearly indicate that it's an exporter to Elastic APM (or Elastic Observability, whichever is the official product name), not to "Elastic Stack" that can be easily confused with Elasticsearch or ELK.

cyrille-leclerc · 2021-02-09T22:19:36Z

Thanks @yurishkuro . I have submitted the PR #2316 to clarify the documentation page of the Elastic exporter following your suggestions.

jpkrohling · 2021-02-10T13:25:07Z

Sorry, but what confused me (and still confuses) is that it seems that there's an exporter for Elasticsearch in this PR, as hinted by the title of this PR, by stating that "elasticsearch exporter is based on the official go-elasticsearch client" and by the change to the testdata:

opentelemetry-collector-contrib/exporter/elasticexporter/testdata/config.yaml

Lines 15 to 16 in 163000d

    
           # Elasticsearch settings 
        
           elasticsearch_urls: [https://elastic.example.com:9200]

Despite that, does it mean that I can't use a local Elasticsearch instance as the destination for (some of) my data?

cyrille-leclerc · 2021-02-10T14:38:26Z

@jpkrohling you are right, we proposed in this PR to introduce the exporter of logs to Elasticsearch in this elasticexporter following the conversation #1630 (comment) where we initially proposed in a different PR to create a dedicated elasticsearchexporter

Based on the recent conversation with @anuraaga #2198 (comment) , we conclude that we come back to the initial proposal, we no longer implement the logs exporter in the elasticexporter, we close this PR and we create a new elasticsearchexporter that will exclusively talk to Elasticsearch, not also to Elastic APM Server.
Closing this PR, we will not have references to Elasticsearch in this elasticexporter.

Does this clarify?

jpkrohling · 2021-02-10T15:22:40Z

It does, but before opening the PR for the elasticsearchexporter, it would be good to have confirmation from the @open-telemetry/collector-maintainers whether this is in the spirit of the collector. I always understood that we'd not have concrete storage mechanisms as part of the collector or contrib. Perhaps something for today's SIG meeting?

cyrille-leclerc · 2021-02-10T16:52:39Z

Thanks @jpkrohling . I'll join the SIG meeting today to contribute to the conversation.
Please not that I continue to have different point of view regarding the phrasing "concrete storage mechanisms". I feel that the architecture of some solutions have the "concrete storage mechanism" being the endpoint to ship data and that's an "implementation detail" for end users.
Some exemples that come to my mind are Elasticsearch-Kibana, Loki-Grafana, InfluxDB-Grafana...

yurishkuro · 2021-02-10T17:39:31Z

Some examples that come to my mind are Elasticsearch-Kibana, Loki-Grafana, InfluxDB-Grafana...

I think the point is that OTEL collector is envisioned as a pipeline, not a destination. Storing data in a concrete DB like Elasticsearch makes it look like destination, but it only make sense if there's something else that can read that data (and there is no one way of representing the data, it's very much dependent on who/what will be reading it).

Having said that, some storage backends (like ES) allow querying them via general purpose UIs (Kibana), and if it's all that the user wants then I don't see why we should prevent them from developing the exporters in the contrib. I think we just need to make sure that the documentation clearly states how that exporter is meant to be used.

@jpkrohling that ^ does not mean that Jaeger storage exporters belong in OTEL contrib, because they are closely integrated with the query service & UI in terms of the data model and should be part of the Jaeger code base.

urso · 2021-02-10T23:27:29Z

I'm closing this PR in favour of #2324

Regarding the general storage discussion I would recommend to open a separate issue and bring it to the SIG Agent/Collector meeting, so to not bury it in a closed PR. There was a good discussion about exporters like this and in general today.

jpkrohling · 2021-02-11T08:55:36Z

I think the point is that OTEL collector is envisioned as a pipeline, not a destination.

I thought the same, but yesterday's discussion during the SIG call clarified that this isn't the case. I'm still not quite sure if it's a good idea in the long term, though.

that ^ does not mean that Jaeger storage exporters belong in OTEL contrib

Absolutely, Jaeger does need its own storage exporters.

Signed-off-by: Bogdan Drutu <[email protected]>

urso requested a review from a team January 27, 2021 01:11

github-actions bot assigned anuraaga Jan 27, 2021

anuraaga reviewed Jan 27, 2021

View reviewed changes

Base automatically changed from master to main January 28, 2021 00:57

dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021

Move testdata up one level, no point in being data/testdata (#2198)

cb60143

Signed-off-by: Bogdan Drutu <[email protected]>

github-actions bot added the Stale label Feb 6, 2021

Merge branch 'master' into init-elastic-to-es

163000d

yurishkuro reviewed Feb 9, 2021

View reviewed changes

github-actions bot removed the Stale label Feb 10, 2021

urso closed this Feb 10, 2021

urso deleted the init-elastic-to-es branch February 21, 2021 13:40

punya referenced this pull request in punya/opentelemetry-collector-contrib Jul 21, 2021

Move testdata up one level, no point in being data/testdata (#2198)

0bb897b

Signed-off-by: Bogdan Drutu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Exporter: Init publishing to Elasticsearch #2198

Elastic Exporter: Init publishing to Elasticsearch #2198

urso commented Jan 27, 2021

linux-foundation-easycla bot commented Jan 27, 2021 •

edited

Loading

codecov bot commented Jan 27, 2021 •

edited

Loading

anuraaga left a comment

anuraaga Jan 27, 2021

anuraaga Jan 27, 2021

anuraaga Jan 27, 2021

cyrille-leclerc Jan 27, 2021

tigrannajaryan Jan 27, 2021

anuraaga Jan 28, 2021

cyrille-leclerc Jan 29, 2021

cyrille-leclerc Feb 9, 2021

tigrannajaryan Feb 9, 2021

cyrille-leclerc Feb 9, 2021

github-actions bot commented Feb 6, 2021

jpkrohling commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021

jpkrohling commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021 •

edited

Loading

yurishkuro commented Feb 9, 2021

yurishkuro Feb 9, 2021

cyrille-leclerc Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021

yurishkuro commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021 •

edited

Loading

jpkrohling commented Feb 10, 2021

cyrille-leclerc commented Feb 10, 2021

jpkrohling commented Feb 10, 2021

cyrille-leclerc commented Feb 10, 2021

yurishkuro commented Feb 10, 2021

urso commented Feb 10, 2021

jpkrohling commented Feb 11, 2021

Elastic Exporter: Init publishing to Elasticsearch #2198

Elastic Exporter: Init publishing to Elasticsearch #2198

Conversation

urso commented Jan 27, 2021

linux-foundation-easycla bot commented Jan 27, 2021 • edited Loading

codecov bot commented Jan 27, 2021 • edited Loading

Codecov Report

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 6, 2021

jpkrohling commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021

jpkrohling commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021 • edited Loading

yurishkuro commented Feb 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyrille-leclerc commented Feb 9, 2021

yurishkuro commented Feb 9, 2021

cyrille-leclerc commented Feb 9, 2021 • edited Loading

jpkrohling commented Feb 10, 2021

cyrille-leclerc commented Feb 10, 2021

jpkrohling commented Feb 10, 2021

cyrille-leclerc commented Feb 10, 2021

yurishkuro commented Feb 10, 2021

urso commented Feb 10, 2021

jpkrohling commented Feb 11, 2021

linux-foundation-easycla bot commented Jan 27, 2021 •

edited

Loading

codecov bot commented Jan 27, 2021 •

edited

Loading

cyrille-leclerc commented Feb 9, 2021 •

edited

Loading

cyrille-leclerc commented Feb 9, 2021 •

edited

Loading