Skip to content

Add geoip database metrics to /node/stats API#13004

Merged
kaisecheng merged 7 commits intoelastic:masterfrom
kaisecheng:geoip_expose_metric
Jun 23, 2021
Merged

Add geoip database metrics to /node/stats API#13004
kaisecheng merged 7 commits intoelastic:masterfrom
kaisecheng:geoip_expose_metric

Conversation

@kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Jun 18, 2021

Release notes

Add geoip database metrics to /node/stats

What does this PR do?

This PR exposes geoip database metrics to Logstash localhost:9600/node/stats API

Sample output

"geoip" : {
    "database" : {
       “ASN”: {
          "status" : "healthy",
          "fail_check_in_days" : 0,
          "download_at" : "2021-06-21T16:06:54+02:00"
       },
       “City”: {
          "status" : "healthy",
          "fail_check_in_days" : 0,
          "download_at" : "2021-06-21T16:06:54+02:00"
       }
    },
    "download" : {
        "successes" : 15,
        "failures" : 1,
        "last_check_at" : "2021-06-21T16:07:03+02:00",
        "status" : "succeeded"
    }
}

Schema

geoip
  database
    ASN/CIty
      status:
        - init, initial CC database status
        - healthy, using up-to-date EULA database
        - to_be_expired, 25 days without calling service
        - expired, 30 days without calling service
      fail_check_in_days: nr. of days LS fail to call service since the last success
      download_at: the timestamp of the last database download
  download
    successes: the number of successful checks and downloads. When all databases are updated or pass the check, the counter +1
    failures: the number of failed attempts. When one of the databases fails to check or update, the counter +1
    last_check_at: the timestamp of last check
    status:
      - checking, check and download at the moment
      - succeeded, last download succeed
      - failed, last download failed

Why is it important/What is the impact to the user?

It allows system administrators to monitor database status through metric API instead of scanning the log.
When they see database status to_be_expired, they should allow Logstash to access the internet to download the latest database to prevent the expired status that fails the plugin

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

please check the geoip session of curl -XGET 'localhost:9600/_node/stats'

Related issues

Use cases

Screenshots

Logs

@kaisecheng kaisecheng force-pushed the geoip_expose_metric branch from c6f8cbc to d885e1b Compare June 21, 2021 11:45
@kaisecheng kaisecheng requested a review from andsel June 21, 2021 15:39
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Left just a couple of notes on possible improvements for readability

pipeline_id = ThreadContext.get("pipeline.id")
ThreadContext.put("pipeline.id", nil)

@metric.namespace([:download]).gauge(:status, :checking)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assignment could be extracted in a method update_download_status(:checking). In this way the intention is more evident without the read immediately deep into the knowledge of metrics.

ensure
check_age
clean_up_database
set_download_metric(success_cnt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could also use update_download_status so in the flow of download a DB becomes more evident the status change points.

Comment on lines +247 to +258
def set_download_metric(success_cnt)
@metric.namespace([:download]).tap do |n|
n.gauge(:last_check_at, Time.now.iso8601)

if success_cnt == DB_TYPES.size
n.increment(:successes, 1)
n.gauge(:status, :succeeded)
else
n.increment(:failures, 1)
n.gauge(:status, :failed)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be split in two methods, one for the counter and the other for the status, so that the status update points becomes evident in the download flow

@roaksoax
Copy link
Contributor

After discussion, we decided to improve the wording to reflect the following:

geoip -> geoip_download_manager (7.14)
database status:

  • init -> remains for the time being
  • cc - add a new state that reflects that we are using the bundled db because it never was able to download from the service.
  • healthy -> up-to-date (7.14)
  • download_at -> last_updated (7.14)

download -> download_stats ? (7.14)

  • last_failed_at -> adds a date with the time it last failed.
  • last_error_msg -> adds the last error message, if any.
  • status: checking -> updating (for the future we can consider checking, downloading, replacing/updating). (7.14)

@kaisecheng kaisecheng merged commit 5a209ba into elastic:master Jun 23, 2021
kaisecheng added a commit to kaisecheng/logstash that referenced this pull request Jun 23, 2021
This PR adds geoip database status, last update timestamp, download stats counter to Node Stats API
kaisecheng added a commit that referenced this pull request Jun 23, 2021
This PR adds geoip database status, last update timestamp, download stats counter to Node Stats API
kares added a commit to kares/logstash that referenced this pull request Jul 1, 2021
* master: (41 commits)
  Test: resolve integration failure due ECS mode (elastic#13044)
  Feat: event factory support (elastic#13017)
  Doc: Add geoip database API to node stats (elastic#13019)
  Add geoip database metrics to /node/stats API (elastic#13004)
  ecs: on-by-default plus docs (elastic#12830)
  ispec: fix cross-spec leak from fatal error integration specs (elastic#13002)
  Fix UBI source URL (elastic#13008)
  update fpm to allow pkg creation on jdk11+jruby 9.2 (elastic#13005)
  Add unit test to grant that production aliases correspond to a published RubyGem (elastic#12993)
  Fix logstash.bat not setting exit code (elastic#12948)
  Use the OS separator to invoke gradlew from Rake script (elastic#13000)
  Allow per-pipeline config of ECS Compatibility mode via Central Management (elastic#12861)
  Update jinja2 dependency in docker build (elastic#12994)
  fix database manager with multiple pipelines (elastic#12862)
  Fix Reflections stack traces when process yml files in classpath and debug is enabled (elastic#12991)
  Fix/log4j routing to avoid create spurious file (elastic#12965)
  Deps: update JRuby to 9.2.19.0 (elastic#12989)
  Doc: Add tip for checking for existing field (elastic#12899)
  Added test to cover the installation of aliased plugins (elastic#12967)
  CI: Update logstash_release.json after 7.3.12 (elastic#12986)
  ...
@karenzone
Copy link
Contributor

Docs added: #13019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants