Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
69f7c35
Fix bug in opsgenie plugin mixing up notify-services and schedules
EdwardDowling Feb 8, 2024
c7ae046
Update docs for opsgenie plugin to use correct schedules
EdwardDowling Feb 8, 2024
cb75dd9
Remove accidental change to opsgenie client
EdwardDowling Feb 9, 2024
38ad63b
Update opsgenie client tests
EdwardDowling Feb 12, 2024
d0798ef
Update opsgneie plugin behaviour for determining recipients
EdwardDowling Feb 20, 2024
c37e82b
Update docs with correct screenshot of role creation
EdwardDowling Mar 11, 2024
0daf76c
Stop Opsgenie plugin sending alerts when /schedules label is set
EdwardDowling Mar 13, 2024
759d35e
Change rawRecipients initialisation in opsgenie bot
EdwardDowling Mar 13, 2024
92187ff
Remove opsgenie add role example image from docs
EdwardDowling Mar 13, 2024
bf21892
Update integrations/access/accessrequest/app.go
EdwardDowling Mar 19, 2024
a7df8ff
Update docs/pages/access-controls/access-request-plugins/opsgenie.mdx
EdwardDowling Mar 19, 2024
7b87ee9
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 19, 2024
0462903
Make variable Opsgenie plugin var names more consistent
EdwardDowling Mar 19, 2024
1d41ac3
Update changelog with opsgenie breaking changes
EdwardDowling Mar 19, 2024
6bb8aca
Rename ReqAnnotationSchedulesLabel for consistency
EdwardDowling Mar 20, 2024
a029ebd
Fix opsgenie getOnCall
EdwardDowling Mar 20, 2024
6dc6241
Add opsgenie approval tests + add approval user configuration
hugoShaka Mar 20, 2024
33e83a1
Appease linter
EdwardDowling Mar 25, 2024
443703f
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 25, 2024
57a5a7a
Add back in opsgenie client fix
EdwardDowling Mar 25, 2024
170d55c
Appease linter
EdwardDowling Mar 25, 2024
fe4d8e8
Add in clarifying comment for alert request retry behaviour
EdwardDowling Mar 26, 2024
2598ccc
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 27, 2024
293f65b
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 27, 2024
27de3fc
Add missing opsgenie endpoint mocks
EdwardDowling Mar 27, 2024
0f3ff90
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 27, 2024
1a8c5b4
Merge branch 'master' into edwarddowling/opsgenie-plugin-fix
EdwardDowling Mar 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

## 16.0.0 (xx/xx/xx)

### Breaking changes

#### Opsgenie plugin annotations

Opsgenie plugin users, role annotations must now contain
`teleport.dev/notify-services` to receive notification on Opsgenie.
`teleport.dev/schedules` is now the label used to determine auto approval flow.
See [the Opsgenie plugin documentation](docs/pages/access-controls/access-request-plugins/opsgenie.mdx)
for setup instructions.

## 15.0.0 (xx/xx/24)

### New features
Expand Down
8 changes: 4 additions & 4 deletions api/types/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -727,10 +727,10 @@ const (
// DiscoveryAppIgnore specifies if a Kubernetes service should be ignored by discovery service.
DiscoveryAppIgnore = TeleportNamespace + "/ignore"

// ReqAnnotationSchedulesLabel is the request annotation key at which schedules are stored for access plugins.
ReqAnnotationSchedulesLabel = "/schedules"
// ReqAnnotationNotifyServicesLabel is the request annotation key at which notify services are stored for access plugins.
ReqAnnotationNotifyServicesLabel = "/notify-services"
// ReqAnnotationApproveSchedulesLabel is the request annotation key at which schedules are stored for access plugins.
ReqAnnotationApproveSchedulesLabel = "/schedules"
// ReqAnnotationNotifySchedulesLabel is the request annotation key at which notify schedules are stored for access plugins.
ReqAnnotationNotifySchedulesLabel = "/notify-services"

// CloudAWS identifies that a resource was discovered in AWS.
CloudAWS = "AWS"
Expand Down
Binary file not shown.
11 changes: 6 additions & 5 deletions docs/pages/access-controls/access-request-plugins/opsgenie.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@ To create a user first navigate to Management -> Access -> Roles

Then select 'Create New Role' and create the requester role.

![Add user one](../../../img/enterprise/plugins/opsgenie/add-requester-role.png)

```
kind: role
version: v5
Expand All @@ -75,10 +73,13 @@ spec:
- approve: 1
deny: 1
annotations:
teleport.dev/schedules: ['teleport-access-request-notifications']
teleport.dev/notify-services: ['teleport-access-request-notifications']
teleport.dev/schedules: ['teleport-access-alert-schedules']
```

The `teleport.dev/schedules` annotation specifies the schedule the alert will be be created for.
The `teleport.dev/notify-services` annotation specifies the schedules the alert will be created for.
The `teleport.dev/schedules` annotation specifies the schedules the alert will check, and auto approve the
Access Request if the requesting user is on-call.
Comment thread
EdwardDowling marked this conversation as resolved.

### Create a user who will request access

Expand Down Expand Up @@ -121,7 +122,7 @@ As the Teleport user `myuser`, create an Access Request for the `editor` role:

In Opsgenie, you will see a new alert containing information about the
Access Request in either the default schedule specified when enrolling the plugin,
or in the schedules specified by `teleport.dev/schedules` annotation in the requester's role.
or in the schedules specified by `teleport.dev/notify-services` annotation in the requester's role.

### Resolve the request

Expand Down
19 changes: 11 additions & 8 deletions integrations/access/accessrequest/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,19 @@ func (a *App) getMessageRecipients(ctx context.Context, req types.AccessRequest)
recipientSet.Add(common.Recipient{})
return recipientSet.ToSlice()
case types.PluginTypeOpsgenie:
if recipients, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]; ok {
for _, recipient := range recipients {
rec, err := a.bot.FetchRecipient(ctx, recipient)
if err != nil {
log.Warning(err)
}
recipientSet.Add(*rec)
}
recipients, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]
if !ok {
return recipientSet.ToSlice()
}
for _, recipient := range recipients {
rec, err := a.bot.FetchRecipient(ctx, recipient)
if err != nil {
log.Warningf("Failed to fetch Opsgenie recipient: %v", err)
continue
}
recipientSet.Add(*rec)
Comment thread
EdwardDowling marked this conversation as resolved.
}
return recipientSet.ToSlice()
}

validEmailSuggReviewers := []string{}
Expand Down
38 changes: 24 additions & 14 deletions integrations/access/opsgenie/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ import (
tp "github.com/gravitational/teleport"
"github.com/gravitational/teleport/api/client/proto"
"github.com/gravitational/teleport/api/types"
"github.com/gravitational/teleport/integrations/access/common"
"github.com/gravitational/teleport/integrations/access/common/teleport"
"github.com/gravitational/teleport/integrations/lib"
"github.com/gravitational/teleport/integrations/lib/backoff"
Expand All @@ -45,9 +44,9 @@ const (
// minServerVersion is the minimal teleport version the plugin supports.
minServerVersion = "6.1.0"
// initTimeout is used to bound execution time of health check and teleport version check.
initTimeout = time.Second * 10
initTimeout = time.Second * 30
// handlerTimeout is used to bound the execution time of watcher event handler.
handlerTimeout = time.Second * 5
handlerTimeout = time.Second * 30
// modifyPluginDataBackoffBase is an initial (minimum) backoff value.
modifyPluginDataBackoffBase = time.Millisecond
// modifyPluginDataBackoffMax is a backoff threshold
Expand Down Expand Up @@ -141,10 +140,9 @@ func (a *App) init(ctx context.Context) error {
defer cancel()

var err error
if a.teleport == nil {
if a.teleport, err = common.GetTeleportClient(ctx, a.conf.Teleport); err != nil {
return trace.Wrap(err)
}
a.teleport, err = a.conf.GetTeleportClient(ctx)
if err != nil {
return trace.Wrap(err, "getting teleport client")
}

if _, err = a.checkTeleportVersion(ctx); err != nil {
Expand All @@ -155,6 +153,13 @@ func (a *App) init(ctx context.Context) error {
if err != nil {
return trace.Wrap(err)
}

log := logger.Get(ctx)
log.Debug("Starting API health check...")
if err = a.opsgenie.CheckHealth(ctx); err != nil {
return trace.Wrap(err, "API health check failed")
}
log.Debug("API health check finished ok")
return nil
}

Expand Down Expand Up @@ -269,15 +274,15 @@ func (a *App) onDeletedRequest(ctx context.Context, reqID string) error {
}

func (a *App) getNotifyServiceNames(req types.AccessRequest) ([]string, error) {
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifyServicesLabel]
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]
if !ok {
return nil, trace.NotFound("notify services not specified")
}
return services, nil
}

func (a *App) getOnCallServiceNames(req types.AccessRequest) ([]string, error) {
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]
services, ok := req.GetSystemAnnotations()[types.TeleportNamespace+types.ReqAnnotationApproveSchedulesLabel]
if !ok {
return nil, trace.NotFound("on-call schedules not specified")
}
Expand All @@ -294,11 +299,16 @@ func (a *App) tryNotifyService(ctx context.Context, req types.AccessRequest) (bo
}

reqID := req.GetName()
annotations := types.Labels{}
for k, v := range req.GetSystemAnnotations() {
annotations[k] = v
}
reqData := RequestData{
User: req.GetUser(),
Roles: req.GetRoles(),
Created: req.GetCreationTime(),
RequestReason: req.GetRequestReason(),
User: req.GetUser(),
Roles: req.GetRoles(),
Created: req.GetCreationTime(),
RequestReason: req.GetRequestReason(),
SystemAnnotations: annotations,
}

// Create plugin data if it didn't exist before.
Expand Down Expand Up @@ -429,7 +439,7 @@ func (a *App) tryApproveRequest(ctx context.Context, req types.AccessRequest) er
if _, err := a.teleport.SubmitAccessReview(ctx, types.AccessReviewSubmission{
RequestID: req.GetName(),
Review: types.AccessReview{
Author: tp.SystemAccessApproverUserName,
Author: a.conf.TeleportUserName,
Comment thread
r0mant marked this conversation as resolved.
ProposedState: types.RequestState_APPROVED,
Reason: fmt.Sprintf("Access requested by user %s who is on call on service(s) %s",
tp.SystemAccessApproverUserName,
Expand Down
19 changes: 12 additions & 7 deletions integrations/access/opsgenie/bot.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,17 @@ func (b Bot) SendReviewReminders(ctx context.Context, recipients []common.Recipi
}

// BroadcastAccessRequestMessage creates an alert for the provided recipients (schedules)
func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipients []common.Recipient, reqID string, reqData pd.AccessRequestData) (data accessrequest.SentMessages, err error) {
schedules := []string{}
for _, recipient := range recipients {
schedules = append(schedules, recipient.Name)
func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipientSchedules []common.Recipient, reqID string, reqData pd.AccessRequestData) (data accessrequest.SentMessages, err error) {
notificationSchedules := make([]string, 0, len(recipientSchedules))
for _, notifySchedule := range recipientSchedules {
notificationSchedules = append(notificationSchedules, notifySchedule.Name)
}
if len(recipients) == 0 {
schedules = append(schedules, b.client.DefaultSchedules...)
autoApprovalSchedules := []string{}
if annotationAutoApprovalSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationApproveSchedulesLabel]; ok {
autoApprovalSchedules = annotationAutoApprovalSchedules
}
if len(autoApprovalSchedules) == 0 {
autoApprovalSchedules = append(autoApprovalSchedules, b.client.DefaultSchedules...)
}
opsgenieReqData := RequestData{
User: reqData.User,
Expand All @@ -79,7 +83,8 @@ func (b *Bot) BroadcastAccessRequestMessage(ctx context.Context, recipients []co
Reason: reqData.ResolutionReason,
},
SystemAnnotations: types.Labels{
types.TeleportNamespace + types.ReqAnnotationSchedulesLabel: schedules,
types.TeleportNamespace + types.ReqAnnotationApproveSchedulesLabel: autoApprovalSchedules,
types.TeleportNamespace + types.ReqAnnotationNotifySchedulesLabel: notificationSchedules,
},
}
opsgenieData, err := b.client.CreateAlert(ctx, reqID, opsgenieReqData)
Expand Down
66 changes: 57 additions & 9 deletions integrations/access/opsgenie/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,24 @@ import (
"github.com/aws/aws-sdk-go/aws/defaults"
"github.com/go-resty/resty/v2"
"github.com/gravitational/trace"
"github.com/jonboulle/clockwork"

"github.com/gravitational/teleport/api/types"
"github.com/gravitational/teleport/integrations/access/common"
"github.com/gravitational/teleport/integrations/lib"
"github.com/gravitational/teleport/integrations/lib/backoff"
"github.com/gravitational/teleport/integrations/lib/logger"
)

const (
// alertKeyPrefix is the prefix for Alert's alias field used when creating an Alert.
alertKeyPrefix = "teleport-access-request"
heartbeatName = "teleport-access-heartbeat"
alertKeyPrefix = "teleport-access-request"
heartbeatName = "teleport-access-heartbeat"
ResponderTypeSchedule = "schedule"
ResponderTypeUser = "user"

ResolveAlertRequestRetryInterval = time.Second * 10
ResolveAlertRequestRetryTimeout = time.Minute * 2
)

var alertBodyTemplate = template.Must(template.New("alert body").Parse(
Expand Down Expand Up @@ -135,11 +142,11 @@ func (og Client) CreateAlert(ctx context.Context, reqID string, reqData RequestD
Message: fmt.Sprintf("Access request from %s", reqData.User),
Alias: fmt.Sprintf("%s/%s", alertKeyPrefix, reqID),
Description: bodyDetails,
Responders: og.getResponders(reqData),
Responders: og.getScheduleResponders(reqData),
Priority: og.Priority,
}

var result AlertResult
var result CreateAlertResult
resp, err := og.client.NewRequest().
SetContext(ctx).
SetBody(body).
Expand All @@ -153,20 +160,60 @@ func (og Client) CreateAlert(ctx context.Context, reqID string, reqData RequestD
if resp.IsError() {
return OpsgenieData{}, errWrapper(resp.StatusCode(), string(resp.Body()))
}

// If this fails, Teleport request approval and auto-approval will still work,
// but incident in Opsgenie won't be auto-closed or updated as the alertID won't be available.
alertRequestResult, err := og.tryGetAlertRequestResult(ctx, result.RequestID)
Copy link
Copy Markdown
Collaborator

@r0mant r0mant Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if this fails for some reason?

I wonder if it would be more robust to move this logic to request resolution time. So we would store RequestID here, and then the plugin would attempt to query AlertID when it needs to resolve the incident (whether it's by a reviewer or by auto-approval). WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been trying out implementing this this morning and I think it should work but it gets a bit messy now having to double check everywhere we interact with an alert whether or not we have the alert or the request id and resolving it if appropriate. I think we might end up having to change some of the shared plugin code as well if we go this route to allow the accessrequest/MessageData stuff to represent a 'message' that we dont have the ID of yet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event this fails it could go one of two ways depending on if the request ever does end up completing successfully after the retries fail.

  • 1: The alert is never created and that is why the retries failed - in this case the plugin just never interacts with that access request again.
  • 2: The alert is eventually created after the retries all fail - in this case the alert is created but the plugin wont be able to send updates or resolve it

Copy link
Copy Markdown
Collaborator

@r0mant r0mant Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-approval will still work in both of these cases, right? As long as we're able to determine if requester is on-call. Cause we never interact with alert/request ID to actually approve Teleport's request - just the alert will not be automatically closed in Opsgenie, correct?

Copy link
Copy Markdown
Contributor Author

@EdwardDowling EdwardDowling Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the notify error in onPendingRequest in opsgenie/app.go and the error from trying to resolve the request are put into an aggregate error instead of returning early.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case, let's do this:

  • Add a comment here explaining that if this fails, Teleport request approval and auto-approval will still work but incident in Opsgenie won't be auto-closed.
  • Increase the timeout to something more than 30s, maybe a couple minutes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the comment and upped the timeout, does this look okay now?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I already approved so let's get this merged and do the backports.

if err != nil {
return OpsgenieData{}, trace.Wrap(err)
}

return OpsgenieData{
AlertID: result.Alert.ID,
AlertID: alertRequestResult.Data.AlertID,
}, nil
}

func (og Client) getResponders(reqData RequestData) []Responder {
func (og Client) tryGetAlertRequestResult(ctx context.Context, reqID string) (GetAlertRequestResult, error) {
backoff := backoff.NewDecorr(ResolveAlertRequestRetryInterval, ResolveAlertRequestRetryTimeout, clockwork.NewRealClock())
for {
alertRequestResult, err := og.getAlertRequestResult(ctx, reqID)
if err == nil {
logger.Get(ctx).Debugf("Got alert request result: %+v", alertRequestResult)
return alertRequestResult, nil
}
logger.Get(ctx).Debug("Failed to get alert request result:", err)
if err := backoff.Do(ctx); err != nil {
return GetAlertRequestResult{}, trace.Wrap(err)
}
}
}

func (og Client) getAlertRequestResult(ctx context.Context, reqID string) (GetAlertRequestResult, error) {
var result GetAlertRequestResult
resp, err := og.client.NewRequest().
SetContext(ctx).
SetResult(&result).
SetPathParams(map[string]string{"requestID": reqID}).
Get("v2/alerts/requests/{requestID}")
if err != nil {
return GetAlertRequestResult{}, trace.Wrap(err)
}
defer resp.RawResponse.Body.Close()
if resp.IsError() {
return GetAlertRequestResult{}, errWrapper(resp.StatusCode(), string(resp.Body()))
}
return result, nil
}

func (og Client) getScheduleResponders(reqData RequestData) []Responder {
schedules := og.DefaultSchedules
if reqSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationSchedulesLabel]; ok {
if reqSchedules, ok := reqData.SystemAnnotations[types.TeleportNamespace+types.ReqAnnotationNotifySchedulesLabel]; ok {
schedules = reqSchedules
}
responders := make([]Responder, 0, len(schedules))
for _, s := range schedules {
responders = append(responders, Responder{
Type: "schedule",
Type: ResponderTypeSchedule,
ID: s,
})
}
Expand Down Expand Up @@ -231,12 +278,13 @@ func (og Client) GetOnCall(ctx context.Context, scheduleName string) (Responders
SetContext(ctx).
SetPathParams(map[string]string{"scheduleName": scheduleName}).
SetQueryParams(map[string]string{
// This is required to lookup schedules by name (as opposed to lookup by ID)
"scheduleIdentifierType": "name",
// When flat is enabled it returns the email addresses of on-call participants.
"flat": "true",
}).
SetResult(&result).
Post("v2/schedules/{scheduleName}/on-calls")
Get("v2/schedules/{scheduleName}/on-calls")
if err != nil {
return RespondersResult{}, trace.Wrap(err)
}
Expand Down
5 changes: 4 additions & 1 deletion integrations/access/opsgenie/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ import (
func TestCreateAlert(t *testing.T) {
recievedReq := ""
testServer := httptest.NewServer(http.HandlerFunc(func(res http.ResponseWriter, req *http.Request) {
if req.URL.Path != "/v2/alerts" {
return
}
bodyBytes, err := io.ReadAll(req.Body)
if err != nil {
log.Fatal(err)
Expand All @@ -56,7 +59,7 @@ func TestCreateAlert(t *testing.T) {
Roles: []string{"role1", "role2"},
RequestReason: "someReason",
SystemAnnotations: types.Labels{
types.TeleportNamespace + types.ReqAnnotationSchedulesLabel: {"responder@teleport.com"},
types.TeleportNamespace + types.ReqAnnotationNotifySchedulesLabel: {"responder@teleport.com"},
},
})
assert.NoError(t, err)
Expand Down
Loading