Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Alert implementation #198

Merged
merged 19 commits into from
Jan 23, 2025
Merged

feat: Alert implementation #198

merged 19 commits into from
Jan 23, 2025

Conversation

alexluong
Copy link
Collaborator

@alexluong alexluong commented Jan 16, 2025

initial comment partial implementation of #105; this PR only implements an `internal/alert` package which is the logic for the alert feature. I'll open another PR shortly where this package is used within the `deliverymq` flow and full e2e test cases.

Note: I just went over the spec and noticed the ALERT_FAILURE_RATE feature. Will get around to that next too.

Alert System Implementation

Implements a Redis-backed alert system for monitoring destination health and auto-disabling failing destinations.

Core Interfaces

  • AlertMonitor: Main interface for handling delivery attempts

    • HandleAttempt(ctx context.Context, attempt DeliveryAttempt) error
  • AlertStore: Redis persistence layer for failure counts and alert states

    • IncrementAndGetAlertState(ctx context.Context, tenantID, destinationID string) (AlertState, error)
    • ResetAlertState(ctx context.Context, tenantID, destinationID string) error
    • UpdateLastAlert(ctx context.Context, tenantID, destinationID string, t time.Time, level int) error
  • AlertEvaluator: Determines when to trigger alerts based on thresholds

    • GetAlertLevel(failures int64) int
    • ShouldAlert(failures int64, lastAlertTime time.Time, lastAlertLevel int) (level int, shouldAlert bool)
  • AlertNotifier: Sends HTTP alerts to configured callback URL

    • Notify(ctx context.Context, alert Alert) error

Redis Schema

Key Structure:

  • alert:{tenant_id}:{destination_id}:failures - Counter for consecutive failures
  • alert:{tenant_id}:{destination_id}:last_alert - Hash storing:
    • time: Last alert timestamp
    • level: Last alert level (percentage)

Key Features

  • Configurable alert thresholds (e.g. 50%, 70%, 90%, 100%)
  • Alert debouncing with configurable interval
  • Auto-disable destinations at 100% threshold
  • Stateless evaluation with Redis-backed persistence

Copy link

vercel bot commented Jan 16, 2025

@alexluong is attempting to deploy a commit to the Hookdeck team on Vercel, but is not a member of this team. To resolve this issue, you can:

  • Make your repository public. Collaboration is free for open source and public repositories.
  • Add @alexluong as a member. A Pro subscription is required to access Vercel's collaborative features.
    • If you're the owner of the team, click here and add @alexluong as a member.
    • If you're the user who initiated this build request, click here to request access.
    • If you're already a member of the Hookdeck team, make sure that your Vercel account is connected to your GitHub account.

To read more about collaboration on Vercel, click here.

}

// Create request
req, err := http.NewRequestWithContext(ctx, http.MethodPost, n.callbackURL, bytes.NewReader(body))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we say we'd publish to alert to a MQ instead? If we stick to a callback URL we need some auth mecanishm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I recall, we briefly discussed publishing alerts to an MQ but I don't think we made a final call on that. With that said, the way the notifier is implemented we can update the mechanism pretty easily.

I wonder if it makes sense to send the alert to the deliverymq and let the operator register their internal destinations somehow?

Let me know if you'd like us to send the alert to an mq here. I think it's a bit trickier but totally doable too.

Anyway, I did implement the auth callback using API key as the spec, please check it out!

@alexluong alexluong merged commit 6c00a60 into main Jan 23, 2025
1 check failed
@alexluong alexluong deleted the alert branch January 23, 2025 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants