Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update FPC invoker health reporting logic #5464

Merged
merged 2 commits into from
Feb 14, 2024

Conversation

bdoyle0182
Copy link
Contributor

Description

With FPC, the list of managed invoker ids is determined by what's in etcd. However the metric reporting and invokers api still assume that 0...n invokers exist up to the max available id in etcd and will auto fill in any missing ids as offline. This is fine for a range to fill in by the current min starting point, but it's not providing any value to assume things start at 0. A cluster could have a monotonically increasing range of ids and potentially if original nodes are lost they shouldn't be assumed to be filled back in at 0.

Further, the current implementation already will not include invokers that are offline at the high end of the pool since the api / metrics just assumes that max id stored in etcd is the max id of the cluster. Example ten node cluster of 0-9:

  • nodes 8 and 9 are down
  • the api will then only return 0-7 as existing since 7 is the max remaining id in etcd once the lease expires for those nodes.

With this change, this applies the same functionality to the low end to account for clusters that don't always re-populate 0-x.

  • nodes 0 and 1 are down

  • the api will now return 2-9 as existing since 2 is the min remaining id in etcd

  • if node 5 is down in the middle between the min and max, that still will be auto-populated to be down.

It's also important to note that this change only applies to FPC as the 0-n expectation of invoker ids is more important for the original load balancer algorithm with co-prime hashing.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e20ab17) 76.86% compared to head (bb22e7a) 75.87%.

❗ Current head bb22e7a differs from pull request most recent head 85df554. Consider uploading reports for the commit 85df554 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5464      +/-   ##
==========================================
- Coverage   76.86%   75.87%   -0.99%     
==========================================
  Files         241      241              
  Lines       14649    14650       +1     
  Branches      629      644      +15     
==========================================
- Hits        11260    11116     -144     
- Misses       3389     3534     +145     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bdoyle0182 bdoyle0182 changed the title Update FPC invoker health logic Update FPC invoker health reporting logic Feb 13, 2024
Copy link
Member

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bdoyle0182 bdoyle0182 merged commit aea3a88 into apache:master Feb 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants