[GOBBLIN- 1856] Add flow trigger handler leasing metrics#3717
Conversation
d4a160e to
852e001
Compare
| // TODO: add a log event or metric for each of these cases | ||
| if (leaseAttemptStatus instanceof MultiActiveLeaseArbiter.LeaseObtainedStatus) { | ||
| MultiActiveLeaseArbiter.LeaseObtainedStatus leaseObtainedStatus = (MultiActiveLeaseArbiter.LeaseObtainedStatus) leaseAttemptStatus; | ||
| this.metricContext.contextAwareCounter(ServiceMetricNames.FLOW_TRIGGER_HANDLER_LEASE_OBTAINED_COUNT); |
There was a problem hiding this comment.
Overall flow of creating and using these metrics should be
- create counter at class level
- register the metric with metric context
- call counter.count/mark() every time you want it to increase
Here you are creating a new one every time you mean to actually increment it. For ex:
There was a problem hiding this comment.
I had a different idea in mind since I only wanted to check if the FlowTriggerHandler gets different types of statuses or not. But I guess initializing at class level and inc() the counter will help gauge if the traffic distribution is happening uniformly between the hosts or not!
There was a problem hiding this comment.
Discussed offline and agreed the counter will provide more information about load balancing and relative amount of occurrence of each case
| eventTimeMillis); | ||
| return; | ||
| } else if (leaseAttemptStatus instanceof MultiActiveLeaseArbiter.LeasedToAnotherStatus) { | ||
| this.metricContext.contextAwareCounter(ServiceMetricNames.FLOW_TRIGGER_HANDLER_LEASED_TO_ANOTHER_COUNT); |
There was a problem hiding this comment.
do you want to add logs here for now for the other cases as well? Maybe they can be debug level logs
There was a problem hiding this comment.
I wanted to but then realized that we do spit out some logs as part of this method scheduleReminderForEvent and didn't want to make it noisy for emitting similar logs. However, I have added a debug level log when we no longer attempt for a lease... since I found that to be missing :)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3717 +/- ##
============================================
- Coverage 46.96% 46.94% -0.03%
+ Complexity 10818 10811 -7
============================================
Files 2143 2143
Lines 84600 84555 -45
Branches 9404 9390 -14
============================================
- Hits 39733 39694 -39
- Misses 41241 41245 +4
+ Partials 3626 3616 -10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* upstream/master: Fix bug with total count watermark whitelist (apache#3724) [GOBBLIN-1858] Fix logs relating to multi-active lease arbiter (apache#3720) [GOBBLIN-1838] Introduce total count based completion watermark (apache#3701) Correct num of failures (apache#3722) [GOBBLIN- 1856] Add flow trigger handler leasing metrics (apache#3717) [GOBBLIN-1857] Add override flag to force generate a job execution id based on gobbl… (apache#3719) [GOBBLIN-1855] Metadata writer tests do not work in isolation after upgrading to Iceberg 1.2.0 (apache#3718) Remove unused ORC writer code (apache#3710) [GOBBLIN-1853] Reduce # of Hive calls during schema related updates (apache#3716) [GOBBLIN-1851] Unit tests for MysqlMultiActiveLeaseArbiter with Single Participant (apache#3715) [GOBBLIN-1848] Add tags to dagmanager metrics for extensibility (apache#3712) [GOBBLIN-1849] Add Flow Group & Name to Job Config for Job Scheduler (apache#3713) [GOBBLIN-1841] Move disabling of current live instances to the GobblinClusterManager startup (apache#3708) [GOBBLIN-1840] Helix Job scheduler should not try to replace running workflow if within configured time (apache#3704) [GOBBLIN-1847] Exceptions in the JobLauncher should try to delete the existing workflow if it is launched (apache#3711) [GOBBLIN-1842] Add timers to GobblinMCEWriter (apache#3703) [GOBBLIN-1844] Ignore workflows marked for deletion when calculating container count (apache#3709) [GOBBLIN-1846] Validate Multi-active Scheduler with Logs (apache#3707) [GOBBLIN-1845] Changes parallelstream to stream in DatasetsFinderFilteringDecorator to avoid classloader issues in spark (apache#3706) [GOBBLIN-1843] Utility for detecting non optional unions should convert dataset urn to hive compatible format (apache#3705) [GOBBLIN-1837] Implement multi-active, non blocking for leader host (apache#3700) [GOBBLIN-1835]Upgrade Iceberg Version from 0.11.1 to 1.2.0 (apache#3697) Update CHANGELOG to reflect changes in 0.17.0 Reserving 0.18.0 version for next release [GOBBLIN-1836] Ensuring Task Reliability: Handling Job Cancellation and Graceful Exits for Error-Free Completion (apache#3699) [GOBBLIN-1805] Check watermark for the most recent hour for quiet topics (apache#3698) [GOBBLIN-1825]Hive retention job should fail if deleting underlying files fail (apache#3687) [GOBBLIN-1823] Improving Container Calculation and Allocation Methodology (apache#3692) [GOBBLIN-1830] Improving Container Transition Tracking in Streaming Data Ingestion (apache#3693) [GOBBLIN-1833]Emit Completeness watermark information in snapshotCommitEvent (apache#3696)
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
As part of this PR, I want to add metrics for
FlowTriggerHandlerto help understand if we were able to obtain, handle and transition the different lease acquiring statuses when we switch to multi-active scheduler mode where each scheduler would attempt to lease a flow event and process it further based on the status of the attempt.Tests
Commits