-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CheckpointLogger v2: cleaner usage, reliability counters, more UploadFlow metrics #100
Conversation
Codecov Report
@@ Coverage Diff @@
## main #100 +/- ##
========================================
Coverage 98.46% 98.46%
========================================
Files 369 371 +2
Lines 27233 27495 +262
========================================
+ Hits 26814 27073 +259
- Misses 419 422 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of defining the success/failure events and the subflows in the metrics definition directly. Quite the complex piece of code. Exquisite job making it so flexible and easy to use 👏👏👏 .
Looks good in general, just some points I'd like to get your opinion on first
Some (small) comments I would add here would be:
- I think
UploadFlow
should be defined in a separate file. Maybehelpers/flows/upload.py
or something. - It looks like that
@subflows
should be defined after@failure_events
and@success_events
. If I'm wrong just skip the rest. The docs and examples hint as much, but I wonder if you should add checks to the success/failure if sub flows have been defined already and at least emit a warning that sub flows might be missing - Do
checkpoints
have a type that we can maybe type hint?... Looks like they're just strings tho :E probably just my own biases towards types talking here, but maybe we can define an explicitFlow
type (in lieu of(str, Enum)
)
you're right, i was lazy lol. your way also makes it clearer people can write their own flows
great catch! i should at least make this explicit in the doc comments so i don't ruin anyone's day if they miss it. thanks!
this question made me finally start reading about python typing lol. thinking out loud: enum values are instances of the enum class. apparently python has generics now (???) so i can maybe do something like:
i can play around with it. idk if there's any weirdness using it from an otherwise untyped file |
so static typing is awkward when i insert new functions at runtime :P i'll keep playing with it but i might file an issue to revisit it later if i can't figure it out |
Codecov Report
@@ Coverage Diff @@
## main #100 +/- ##
==========================================
+ Coverage 93.25% 93.30% +0.04%
==========================================
Files 346 347 +1
Lines 26889 27091 +202
==========================================
+ Hits 25075 25276 +201
- Misses 1814 1815 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #100 +/- ##
==========================================
+ Coverage 93.25% 93.30% +0.04%
==========================================
Files 346 347 +1
Lines 26889 27091 +202
==========================================
+ Hits 25075 25276 +201
- Misses 1814 1815 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
this PR adds some features to
CheckpointLogger
:@subflows()
decorator@success_events()
and@failure_events()
decorators@reliability_counters
decorator{flow}.events.{checkpoint}
){flow}.total.begun
when the first checkpoint is logged{flow}.total.failed
when any@failure_events()
checkpoint is logged{flow}.total.succeeded
when any@success_events()
checkpoint is logged{flow}.total.ended
when any terminal checkpoint (success of failure) is loggedsome reliability metrics this enables:
{flow}.total.ended / {flow}.total.begun
: should be 1, but if it isn't that suggests there are exit conditions that you haven't instrumented{flow}.total.succeeded / {flow}.total.begun
: success rate. denominator maybe could beended
instead{flow}.total.failed / {flow}.total.begun
: failure rate. denominator maybe could beended
instead{flow}.events.SPECIFIC_ERROR / {flow}.total.failed
: % of failures caused bySPECIFIC_ERROR
this PR also instruments more of the exit conditions of the upload flow. there are more non-error exit conditions (no pending jobs, not sending notifs, stale PR head) as well as error exit conditions (no valid bot, git service errors, too many retries...)
sentry is still the backend for subflow durations and statsd is still the backend for the reliability counters, but i'm looking to change both in future iterations.
before
the checkpoints are defined in one place, but the relationships between them (the subflows) are peppered throughout the code.
now
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.