Skip to content

Fetches control group resource information#10402

Merged
tylersmalley merged 10 commits intoelastic:masterfrom
tylersmalley:status-api-cgroup
Apr 11, 2017
Merged

Fetches control group resource information#10402
tylersmalley merged 10 commits intoelastic:masterfrom
tylersmalley:status-api-cgroup

Conversation

@tylersmalley
Copy link
Copy Markdown
Member

These are made available to the status API api/status and kbnServer.metrics.

@jbudz
Copy link
Copy Markdown
Contributor

jbudz commented Feb 16, 2017

Test failure looks legitimate, I'm guessing we need to wait for the promise to resolve on previous status tests

Comment thread src/server/status/cgroup.js Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to update this to match? elastic/elasticsearch@00a8b87

Comment thread src/server/status/metrics.js Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to check for the files once, and if they don't exist stop checking?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now preventing lookups when we determine there is no cgroups information. 655f54b

Copy link
Copy Markdown
Member

@tsullivan tsullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this grabs the cgroup info from /sys/fs/cgroup, unfortunately I think this does need to be tested from inside a Docker container as the ES team found there are inconsistencies in where Docker publishes the cgroup statistics: elastic/elasticsearch#22757

@tylersmalley tylersmalley force-pushed the status-api-cgroup branch 3 times, most recently from 1ae1a6a to ac70577 Compare March 27, 2017 15:24
@tylersmalley
Copy link
Copy Markdown
Member Author

@tsullivan - I pushed a fix for the docker cgroup path.

Here is a docker image on Dockerhub to help with testing:

tylersmalley/kibana:6.0.0-alpha1-SNAPSHOT-1ae1a6a

You can limit the CPU to exposed throttled metrics by passing --cpus=0.1 to docker run

And here are snapshots of the PR:

https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-amd64.deb
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-darwin-x86_64.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-i386.deb
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-i686.rpm
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-linux-x86.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-windows-x86.zip
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-1ae1a6a/kibana/kibana-6.0.0-alpha1-SNAPSHOT-x86_64.rpm

When running, you can apply cgroup constraints to a process/group. To do so, run systemd-cgls to find the user slice which contains kibana. You can then use it to set CPUQuota (example: systemctl set-property user.slice CPUQuota=100%)

Comment thread src/server/status/cgroup.js Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using the radix parameter of parseInt, because without it behavior could be unpredictable.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt

Comment thread src/server/status/cgroup.js Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

radix parameter on these parseInts

@tsullivan
Copy link
Copy Markdown
Member

Thanks for providing the Docker images to test with - that was extremely helpful.

I left just one comment about something that popped up in a few places. I'm surprised that radix for parseInt thing isn't part of our lint rules 😺

@jbudz
Copy link
Copy Markdown
Contributor

jbudz commented Mar 28, 2017

Rebasing should fix the test failures

Comment thread src/server/status/metrics.js Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think there's any need for the path to be configurable? ie instead of a bool, cpu.cgroup.path = Joi.string()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These configuration options align with logstash/ES outlined here: elastic/logstash#6797 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for consistency. it's not clear to me from the thread if override is meant to be a bool, the current (eventually deprecated) implementation in es looks like a string. if i'm missing something feel free to ignore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/server/status/cgroup.js Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you feel about logging this error instead of throwing? if something comes up we can still return non c-group stats

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
This was addressed in elastic/elasticsearch#23219

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Docker cgroups are mounted in the wrong place (i.e., inconsistently with /proc/self/cgroup). This commit adds an undocumented hack for working around, for now.

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
@tylersmalley
Copy link
Copy Markdown
Member Author

tylersmalley commented Apr 5, 2017

Addressed feedback and pushed new builds for testing:

DockerHub: tylersmalley/kibana:6.0.0-alpha1-SNAPSHOT-5812d7b

Example: docker run -p 5601:5601 -e "ELASTICSEARCH_URL=http://172.16.0.63:9200" --cpus=0.1 tylersmalley/kibana:6.0.0-alpha1-SNAPSHOT-5812d7b

https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-amd64.deb
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-darwin-x86_64.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-i386.deb
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-i686.rpm
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-linux-x86.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-windows-x86.zip
https://download.elastic.co/kibana/staging/6.0.0-alpha1-SNAPSHOT-5812d7b/kibana/kibana-6.0.0-alpha1-SNAPSHOT-x86_64.rpm

Copy link
Copy Markdown
Contributor

@jbudz jbudz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. One inline question below. And can we add docs for the new configuration values?

Otherwise LGTM:

cgroup: {
  cpuacct: {
    control_group: "/",
    usage_nanos: 42023428539
  },
  cpu: {
    control_group: "/",
    cfs_period_micros: 100000,
    cfs_quota_micros: -1,
    stat: {
      number_of_elapsed_periods: 0,
      number_of_times_throttled: 0,
      time_throttled_nanos: 0
    }
  }
}

export function getMetrics({ event, config }) {
export async function getMetrics(event, config, server) {
const port = config.get('server.port');
const timestamp = new Date().toISOString();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we should set timestamp after cgroup stats have been read? along the same lines, do you think we should only getMetrics once the previous getMetrics call is done so they don't queue up for frequent ops events?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it right at the ops event is probably best as it's actually involved in aggregations (request count, etc). Capturing cgroups is also a very quick operation in that it's reading most of the files in parallel and they aren't actually on disk. We should revisit when we add CPU percentage.

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
@tylersmalley
Copy link
Copy Markdown
Member Author

@jbudz documented configuration options - placed near existing status config.

screenshot 2017-04-05 13 00 38

@tylersmalley
Copy link
Copy Markdown
Member Author

@tsullivan - mind giving another look?

@tylersmalley tylersmalley removed the request for review from pickypg April 6, 2017 15:23
@pickypg pickypg self-requested a review April 10, 2017 19:32
Copy link
Copy Markdown
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this successfully on Windows (no Cgroup data, but it didn't explode either). I just have some questions about the settings.

Comment thread docs/setup/settings.asciidoc Outdated
The minimum value is 100.
`status.allowAnonymous`:: *Default: false* If authentication is enabled, setting this to `true` allows
unauthenticated users to access the Kibana server status API and status page.
`cpu.cgroup.path.override`:: Override for cgroup cpu path
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we show an example of this being overridden?


cpu: Joi.object({
cgroup: Joi.object({
path: Joi.object({
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like we normally specific "override" as the field to override, rather just cpu.cgroup.path would be set or unset (default). Specifically I'm looking at examples like server.basePath, plugins.paths, and path.data.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name was discussed here for Logstash/ES. While I agree we could omit "override" it maintains consistency with the other projects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's unfortunate, but I guess it's intentional to really show that you're going against defaults.

Copy link
Copy Markdown
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Tyler Smalley <tyler.smalley@elastic.co>
@tylersmalley tylersmalley merged commit 006fae0 into elastic:master Apr 11, 2017
@tylersmalley tylersmalley deleted the status-api-cgroup branch April 11, 2017 17:20
tylersmalley added a commit to tylersmalley/kibana that referenced this pull request Apr 11, 2017
Adds control group data to status API and kbnServer.metrics
@tylersmalley
Copy link
Copy Markdown
Member Author

tylersmalley commented Apr 11, 2017

5.x: b8bb212

tylersmalley added a commit that referenced this pull request Apr 11, 2017
Adds control group data to status API and kbnServer.metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants