Skip to content

Conversation

@cachedout
Copy link
Contributor

This PR closes #10119

It incorporates the following changes:

@cachedout cachedout requested review from jsvd and ycombinator March 14, 2019 19:24
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks sensible and complies with our intent. Yay for bringing more code over into the OSS-license.

@yaauie
Copy link
Member

yaauie commented Mar 19, 2019

jenkins test this again please

@cachedout
Copy link
Contributor Author

Tests are all fixed up now. :)

@yaauie and @ycombinator This is ready for a re-review please.

@ycombinator
Copy link
Contributor

Looking at the response from the API (using this PR), I see this structure:

{
  ...,
  "pipelines": {
    "main": {
      "graph": {
        "hash": "...",
        "type": "lir",
        "version": "...",
        "graph": {
           "vertices": { ... },
           "edges": { ... }
         }
       }
     }
  }
}

How difficult would it be to rename the outer graph key to representation instead? That would match the example JSON shown in the issue description. If it's not trivial to do, we can rename it on the consumer (beats) side.

@cachedout
Copy link
Contributor Author

@ycombinator Sadly, this isn't trivial. I think it would be better to change this on the beats side.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally LGTM. I'll defer to Logstash team folks for the code review. Thanks!

@ycombinator
Copy link
Contributor

ycombinator commented Mar 27, 2019

@cachedout I just noticed that while this PR does indeed implement the requirements of #10119 (to expose additional fields for a pipeline via the GET _node/pipelines/:id API), it has the side effect of exposing the additional pipeline fields in GET _node/pipelines as well, for every pipeline listed in that API's response.

Personally I'm okay with that but it could potentially make the API response quite heavy, if there are several pipelines being run on a node. I'm not sure if it would be desirable to keep the GET _node/pipelines API leaner (as it is currently). Thoughts?

@cachedout
Copy link
Contributor Author

jenkins, test this

@cachedout
Copy link
Contributor Author

Hi @yaauie and @jsvd. We're about ready to start getting these merged. Would you mind please reviewing this when you have a moment? It will help our team stay on track. Thanks!

@ycombinator
Copy link
Contributor

@cachedout I just noticed this point in a comment from @jsvd on the issue for this PR:

the representation is only returned if the query parameter ?representation=true is passed

If its not a lot of work, could we implement this as part of this PR as well?

def resolve_cluster_uuids
cluster_uuids = []
outputs.each do |output|
if LogStash::SETTINGS.registered?(output.id + ".cluster_uuid")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind pointing me to the place where these settings are dynamically registered? I'm not finding cluster_uuid anywhere on current master.

IIRC, LogStash::SETTINGS#validate_all will cause a failure when a setting is provided in the yaml or at command-line and the relevant setting has not been registered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaauie Certainly. This requires a little bit of background knowledge on a discussion that @jsvd had. The setting is registered by the Elasticsearch plugin. The PR which introduces this functionality is here: logstash-plugins/logstash-output-elasticsearch#857

That said, it's totally possible that I don't fully grok how settings should be registered so if this is not the correct manner for doing so, I'd appreciate any advice. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in other threads, I consider routing through LogStash::SETTINGS to stash plugin metadata to be a hack, and while it can be made to work I'd much rather provide you with the tools you need to support this feature in a way that is less fragile.

@ycombinator
Copy link
Contributor

ycombinator commented Apr 3, 2019

I tested this PR along with logstash-plugins/logstash-output-elasticsearch#857, and all works as expected. The pipelines.{id}.cluster_uuids field in the GET _node/pipelines/{id} response shows the expected Elasticsearch cluster UUID(s).

Having cluster_uuids as a top-level field (within a pipeline object) is certainly very convenient from a Metricbeat consumption/parsing POV. However, purely from a data model POV, it feels a bit hacky; perhaps the field should only show up inside pipelines.{id}.graph.graph.vertices objects that represent Elasticsearch output plugin vertices? And even within each such vertex object, the cluster_uuid field should perhaps live inside the meta object field? This will definitely make parsing on the Metricbeat end more involved but I do think it's the right place for the field, since it's specific to a particular type of plugin. WDYT?

BTW, similar to the cluster_uuid lookup enhancement you've done for the Elasticsearch output plugin in logstash-plugins/logstash-output-elasticsearch#857, we'll need to enhance the Elasticsearch filter and input plugins as well. This way we can associate the Logstash pipeline/node with any Elasticsearch clusters it touches in the Stack Monitoring UI. I would not consider these further enhancements as a blocker for this PR here.

@cachedout
Copy link
Contributor Author

@ycombinator The parameter to display the graph is now added. Because the key in the return is called graph and because we also refer to the graph key inside the implementation, I went with ?graph=true as the argument to display the full output. Let me know what you think. It is trivial to change it back to representation of course.

@ycombinator
Copy link
Contributor

The parameter to display the graph is now added.

Thanks @cachedout. I noticed that this parameter works as expected on GET _node/pipleines/{id}. Could we also make it work on GET _node/pipelines? The latter is what Metricbeat actually ended up calling, since it was more efficient to make a single API call and get all pipelines than making an API call per pipeline.

@cachedout cachedout self-assigned this Apr 8, 2019
Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally LGTM. Just re-tested this PR with and without graph=true query parameter on both GET _node/pipelines and GET _node/pipelines/:id calls. All combinations work as expected.

@cachedout
Copy link
Contributor Author

@danhermann Yes, I agree that this approach is sub-optimal, though doing it a different way potentially required some changes to Logstash that I wasn't comfortable making. Are you all right with us getting this in and re-visiting those concerns in a follow-up PR?

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retested this PR, along with elastic/beats#11506 and it works as expected. Still LGTM!

Nice work @cachedout and thanks @danhermann, @yaauie, and @jsvd for the reviews ❤️.

@cachedout cachedout merged commit cc3c5ec into elastic:master May 1, 2019
@elasticsearch-bot
Copy link

Shaunak Kashyap merged this into the following branches!

Branch Commits
7.x fbd1908, 3428383, 9d5bec9, 2dbe5c1, 30adb7a, 0da3d0f, 9c0dc0f, 06d50da, 6bc3ba4, d4a915b, 4e5321a, a875d9e, a8806b1, 8fd6e58, 592591b, dbea951, 42bbdd7, 98f0670

elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
elasticsearch-bot pushed a commit that referenced this pull request May 1, 2019
@jsvd
Copy link
Member

jsvd commented May 6, 2019

Hi folks it seems this change introduces a new warning during startup:

[2019-05-06T15:20:52,142][WARN ][org.logstash.instrument.metrics.gauge.LazyDelegatingGauge] A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization.  It is recommended to log an issue to the responsible developer/development team.

@danhermann should we add the Ruby Array to LazyDelegatingGauge.java?

@danhermann
Copy link
Contributor

The Ruby types that are currently wrapped for metrics (RubyHash, RubyTimestamp, etc.) are deprecated for a variety of reasons, so I'm not sure that we want to add another Ruby type. It's possible that we could coerce it into a Java type.

@jsvd
Copy link
Member

jsvd commented May 6, 2019

@danhermann since this is a new metric, should we just replace it with a java type then before it ships in a release?

@danhermann
Copy link
Contributor

@jsvd, that's what I would suggest.

@cachedout cachedout mentioned this pull request Jul 17, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance GET _node/pipelines/:id API for Metricbeat monitoring

6 participants