-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Exporter: Init publishing to Elasticsearch #2198
Conversation
This is the first step in modifying the Elastic exporter, such that it can publish to APM Server or Elasticsearch. For now only the Logs exporter interface will support publishing to Elasticsearch. It is not possible yet to create an instance of the Elastic Logs exporter using the factory. We will do the final hook up when the exporter is completed and we have tests in place. This change does not yet implement the event transformation and publishing. Documentation for the new settings (see config.go) will be provided once it is possible for users to configure the elasticsearch exporter. The elasticsearch exporter is based on the official [go-elasticsearch](https://github.com/elastic/go-elasticsearch) client. We will use the BulkIndexer provided by the client for event publishing. The client and BulkIndexer provide some support for retrying already. The Elasticsearch Bulk API can report errors at the HTTP level, but uses selective ACKs for individual events. This allows us to retry only failed events and/or reject events that can not be indexed (e.g. due to an mapping error). The 429 error code might even inidcate that we should backoff a little before retrying. In order to prepare for the change I split the original exporter.go into exporter.go, apmserverexporter.go, and elasticsearchexporter.go. The exporter.go will decide which of the concrete exporters shall be used in the future. Besides the "split" of exporter.go, most of the changes are related to configuration and elasticsearch.Client setup.
|
Codecov Report
@@ Coverage Diff @@
## main #2198 +/- ##
==========================================
- Coverage 90.49% 90.07% -0.42%
==========================================
Files 397 399 +2
Lines 19577 19671 +94
==========================================
+ Hits 17716 17719 +3
- Misses 1401 1490 +89
- Partials 460 462 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a high level comment on the package structure. Generally looks good otherwise, though would be good if we can address some of the codecov comments.
var errs []error | ||
var count int | ||
instrumentationLibrarySpansSlice := rs.InstrumentationLibrarySpans() | ||
for i := 0; i < instrumentationLibrarySpansSlice.Len(); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this loop could be moved into the translator package too.
e.logger.Debug("sending events", zap.ByteString("events", w.Bytes())) | ||
|
||
var buf bytes.Buffer | ||
zw, err := zlib.NewWriterLevel(&buf, zlib.DefaultCompression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for compression to not be configurable?
type Config struct { | ||
configmodels.ExporterSettings `mapstructure:",squash"` | ||
configtls.TLSClientSetting `mapstructure:",squash"` | ||
|
||
APMServerConfig `mapstructure:",squash"` | ||
ElasticsearchConfig `mapstructure:",squash"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by having both apm server and elasticsearch here. It seems like they are quite different APIs - since we already have translation in an internal shared package anyways, does it make sense to split into two packages, elasticapmexporter
and elasticsearchexporter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anuraaga this decision to have one single exporter whose implementation pushes data to two different endpoints is the result of a conversation with @tigrannajaryan (see #1630 (comment) ).
We changed our point of view and subscribe to Tigran's point of view, we feel this will be more straightforward for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "one exporter for multiple signals" is a recommendation. That's what OTLP exporter does. In many cases it results in less config settings for the end user to specify (e.g. you likely need to specify access tokens and other credentials, so you do it once, in one exporter config section).
It is not a hard rule, feel free to deviate if you feel there is not much common between the exporters of different signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple signals
My understanding of this is that this isn't quite multiple signals but two different backends, elasticsearch and elastic apm. While logs are here only exported to elasticsearch I guess we can assume elastic apm would also ingest them at some point? And maybe traces would be exported to Elasticsearch similar to Zipkin's support.
So if it's two backends I guess it's natural to have to exporter packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @tigrannajaryan and @anuraaga .
We liked Tigran's suggestion and we got positive feedback from users on the UX we would have, it was perceived as pretty intuitive.
So if it's two backends I guess it's natural to have to exporter packages.
We would like to get the feedback of few users on the User Experience with two exporters, we should be able to share our findings very soon on this thread.
In parallel, we continue the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, sorry for the delay. We have confirmed with a bunch of users that the user experience is intuitive enough with a split of exporters, introducing an elasticsearch
exporter.
We will move forward creating a PR for the new Elasticsearch
exporter and we will close this PR. Does it make sense @anuraaga and @tigrannajaryan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel that works best for you then it works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tigrannajaryan
Signed-off-by: Bogdan Drutu <[email protected]>
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
I have a conceptual question about this: we have avoided connecting the collector to concrete storage mechanisms, as we deemed this as to be the responsibility of the "final destination" (Jaeger, Zipkin, SaaS vendors, ...). Has this changed? Are we accepting storage mechanisms directly under the contrib? This might change a few things for us downstream at Jaeger. |
@jpkrohling I'm wondering if there is here the difference between
The scope of the Elasticsearch exporter we are discussing at the moment is "Elasticsearch as a logs & metrics solution". We didn't consider and didn't plan to consider to consider the other dimension, "Elasticsearch being the storage used by some distributed traces solutions". Do this separation of concern make sense? |
I understand that Elasticsearch isn't a logs and metrics solution, but rather, a data store:
https://www.elastic.co/elasticsearch/ Independent from the signal, I believe that the question remains. Will this repository here accept, say, a postgresqlexporter? |
I meant Elasticsearch combined with Kibana. This stack is broadly used as a logs and metrics solution. Anyway, regarding your question on the addition of more exporters, I was thinking more of Timescale than a PostgreSQL exporter, I'll defer to the maintainers of the OpenTelemetry Collector. I personally see a lot of value in these contrib exporters that help users adopt OpenTelemetry without waiting for the different Observability solutions to provide native support for the OTLP protocol. |
I think this should be renamed to "Elastic APM Exporter". That removes the confusion that it exports to "raw storage": it does not, it uses trace format that is ONLY understood by Elastic APM. Jaeger also has "ES exporter" but with a different span format, and we don't want to have to answer questions in the future like why this "elasticsearch exporter" does not work with Jaeger. |
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package elasticexporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package elasticexporter | |
package elasticapmexporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have explained in #2198 (comment) , please note that
- The
elasticexporter
has been here for 8 months and relates to Elastic as a company and to the Elastic Observability solution - We have not seen any misunderstanding of people who use a tracing solution that store its traces in Elasticsearch (e.g. Jaeger or Zipkin) and who would expect the Elastic Exporter to be what they need. We will be happy to update the Elastic exporter docs if we see confusion.
@yurishkuro I didn't catch the potential risk of confusion of Jaeger or Zipkin users trying to use an Elasticsearch exporter to visualise their traces. As we have said above, we are happy to close this Pull Request and to discuss the question of an exporter of logs to Elasticsearch in a separate thread. I'm sure that we can find a solution to prevent risks of misunderstandings for people who use a tracing solution that stores its traces in Elasticsearch (e.g. Jaeger or Zipkin). |
You can see the confusion already in the question from @jpkrohling about whether contrib exporters are a place for storage exporters.
OpenTelemetry is not widely used yet, so this is not indicative.
Mostly yes, I would still make a couple of small tweaks to clearly indicate that it's an exporter to Elastic APM (or Elastic Observability, whichever is the official product name), not to "Elastic Stack" that can be easily confused with Elasticsearch or ELK. |
Thanks @yurishkuro . I have submitted the PR #2316 to clarify the documentation page of the Elastic exporter following your suggestions. |
Sorry, but what confused me (and still confuses) is that it seems that there's an exporter for Elasticsearch in this PR, as hinted by the title of this PR, by stating that "elasticsearch exporter is based on the official go-elasticsearch client" and by the change to the testdata: opentelemetry-collector-contrib/exporter/elasticexporter/testdata/config.yaml Lines 15 to 16 in 163000d
Despite that, does it mean that I can't use a local Elasticsearch instance as the destination for (some of) my data? |
@jpkrohling you are right, we proposed in this PR to introduce the exporter of logs to Elasticsearch in this Based on the recent conversation with @anuraaga #2198 (comment) , we conclude that we come back to the initial proposal, we no longer implement the logs exporter in the Does this clarify? |
It does, but before opening the PR for the elasticsearchexporter, it would be good to have confirmation from the @open-telemetry/collector-maintainers whether this is in the spirit of the collector. I always understood that we'd not have concrete storage mechanisms as part of the collector or contrib. Perhaps something for today's SIG meeting? |
Thanks @jpkrohling . I'll join the SIG meeting today to contribute to the conversation. |
I think the point is that OTEL collector is envisioned as a pipeline, not a destination. Storing data in a concrete DB like Elasticsearch makes it look like destination, but it only make sense if there's something else that can read that data (and there is no one way of representing the data, it's very much dependent on who/what will be reading it). Having said that, some storage backends (like ES) allow querying them via general purpose UIs (Kibana), and if it's all that the user wants then I don't see why we should prevent them from developing the exporters in the contrib. I think we just need to make sure that the documentation clearly states how that exporter is meant to be used. @jpkrohling that ^ does not mean that Jaeger storage exporters belong in OTEL contrib, because they are closely integrated with the query service & UI in terms of the data model and should be part of the Jaeger code base. |
I'm closing this PR in favour of #2324 Regarding the general storage discussion I would recommend to open a separate issue and bring it to the SIG Agent/Collector meeting, so to not bury it in a closed PR. There was a good discussion about exporters like this and in general today. |
I thought the same, but yesterday's discussion during the SIG call clarified that this isn't the case. I'm still not quite sure if it's a good idea in the long term, though.
Absolutely, Jaeger does need its own storage exporters. |
Signed-off-by: Bogdan Drutu <[email protected]>
Description:
This is the first step in modifying the Elastic exporter, such that it can publish to APM Server or Elasticsearch. For now only the Logs exporter interface will support publishing to Elasticsearch.
It is not possible yet to create an instance of the Elastic Logs exporter using the factory. We will do the final hook up when the exporter is completed and we have tests in place. This change does not yet implement the event transformation and publishing.
The elasticsearch exporter is based on the official go-elasticsearch client. We will use the BulkIndexer provided by the client for event publishing. The client and BulkIndexer provide some support for retrying already. The Elasticsearch Bulk API can report errors at the HTTP level, but uses selective ACKs for individual events. This allows us to retry only failed events and/or reject events that can not be indexed (e.g. due to an mapping error). The 429 error code might even inidcate that we should backoff a little before retrying.
In order to prepare for the change I split the original exporter.go into exporter.go, apmserverexporter.go, and elasticsearchexporter.go. The exporter.go will decide which of the concrete exporters shall be used in the future.
Besides the "split" of exporter.go, most of the changes are related to configuration and
elasticsearch.Client
setup.Link to tracking Issue: #1800
Testing: The config loading tests have been modified to check that the new settings can be read.
Documentation: None so far. Documentation for the new settings (see config.go) will be provided once
it is possible for users to configure the elasticsearch exporter.