Skip to content

athena audit logs - query rate limiter#24918

Merged
tobiaszheller merged 1 commit intomasterfrom
tobiaszheller/auditevents-athena-ratelimit-query
Apr 28, 2023
Merged

athena audit logs - query rate limiter#24918
tobiaszheller merged 1 commit intomasterfrom
tobiaszheller/auditevents-athena-ratelimit-query

Conversation

@tobiaszheller
Copy link
Copy Markdown
Contributor

Part of https://github.com/gravitational/teleport.e/issues/894
RFD: https://github.com/gravitational/teleport/blob/master/rfd/0118-scalable-audit-logs.md

This PR adds rate limit feature to athena audit logs on searching events.
Rate limit is needed to allow throttling of tenants in Cloud.

@github-actions github-actions Bot requested review from lxea and smallinsky April 20, 2023 18:07
@github-actions github-actions Bot added audit-log Issues related to Teleports Audit Log size/md labels Apr 20, 2023
Comment thread lib/events/athena/querier.go Outdated
Comment thread lib/events/athena/querier.go Outdated
Comment thread lib/events/athena/querier.go Outdated
Comment thread lib/events/athena/querier.go Outdated
@smallinsky smallinsky self-requested a review April 21, 2023 08:37
Comment thread lib/events/athena/querier.go Outdated
Comment thread lib/events/athena/querier_test.go Outdated
@tobiaszheller
Copy link
Copy Markdown
Contributor Author

@smallinsky @rosstimothy in dfc304c I have reworked it to use wrap logger inside limiter. It turned out quite a lot of additional code. Let me know what do you think?

Comment thread lib/events/search_limiter.go
Comment thread lib/events/search_limiter.go Outdated
Comment thread lib/events/search_limiter.go Outdated
if cfg.LimiterBurst <= 0 {
return trace.BadParameter("limiterBurst cannot be less or equal to 0")
}
if math.Abs(cfg.LimiterRate) < 1e-9 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1e-9?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we can't just change the cfg.LimiterRate type to uint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for missing docs for that config. LimiterRate defines how many tokens and added per seconds.
I am checking it with 1e-9 just to have very small delta when comparing float against 0. On one hand it should be equal to 0 because if it's not present in url params, it will be default, so 0. @rosstimothy do you think I should just better document it?

@smallinsky i don't think you can define rps using uint, someone may want to set it 0.5 for example. And defining it in miliseconds will be even more confusing.

One thing that we can do is use helper fn from rate package Every. You define interval (time.Duration) at which token is filled. Do you think it will be easier to understand?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 RPS or 10.1 RPS values as rate limit should be quite rate but I don't have strong about it so please free to ignore my comment.

I don't thing that we can do is use helper fn from rate package Every. You define interval (time.Duration) at which token is filled. Do you think it will be easier to understand?

I think that current behaviour with 1s time unit is fine and readable. It should be quite easy to extend in the future but if we don't have case for Athena we should skip it for now and keep it simple.

Comment thread lib/events/search_limiter.go
Comment thread lib/events/search_limiter.go Outdated
Comment thread lib/events/search_limiter_test.go Outdated
Comment thread lib/events/search_limiter_test.go
Comment thread lib/events/search_limiter.go Outdated
if cfg.LimiterBurst <= 0 {
return trace.BadParameter("limiterBurst cannot be less or equal to 0")
}
if math.Abs(cfg.LimiterRate) < 1e-9 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we can't just change the cfg.LimiterRate type to uint.

Comment thread lib/events/search_limiter.go Outdated
}

type SearchEventsLimiterConfig struct {
LimiterRate float64
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We we need float64 type ? Making ing uint will allow to simplify the CheckAndSetDefaults logic

@tobiaszheller
Copy link
Copy Markdown
Contributor Author

@rosstimothy 4fa09ec
addresses all comments from PR. I think only #24918 (comment) remains open and would like to hear your opinions.

@tobiaszheller
Copy link
Copy Markdown
Contributor Author

@smallinsky @rosstimothy in 26b6d79
I have reworked config, it accepts now 3 params:

	// RefillTime is amount of time between refills.
	RefillTime time.Duration
	// RefillAmount is the number of tokens that are added to the bucket during a refill.
	RefillAmount int
	// Burst defines number of available tokens. It's initially full and refilled
	// based on refillAmount and refillDuration.
	Burst int

similary to how our per IP limiter does: https://github.com/gravitational/oxy/blob/c59990dc8c641ec79a7f6eee48a20c7e8f4c69c6/ratelimit/tokenlimiter.go#L31, I just used more descriptive names.

Let me know what do you think?

Comment thread lib/events/search_limiter.go Outdated
Comment thread lib/events/search_limiter.go Outdated
Comment thread lib/events/athena/athena.go Outdated
Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but seems like there is now a broken test case:

=== RUN   TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter
{"caller":"athena/consumer.go:506","component":"athena","error":"operation error SQS: ReceiveMessage, failed to resolve service endpoint, an AWS region is required, but was not found","level":"error","message":"Failure processing SQS messages","timestamp":"2023-04-27T10:16:33Z"}
    service_test.go:322: 
        	Error Trace:	/__w/teleport/teleport/lib/service/service_test.go:322
        	            				/__w/teleport/teleport/lib/service/service_test.go:338
        	Error:      	Received unexpected error:
        	            	LimiterRefillAmount must be greater than 0 if LimiterBurst is used
        	Test:       	TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter
{"caller":"athena/consumer.go:506","component":"athena","error":"operation error SQS: ReceiveMessage, failed to resolve service endpoint, an AWS region is required, but was not found","level":"error","message":"Failure processing SQS messages","timestamp":"2023-04-27T10:16:33Z"}
--- FAIL: TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter (0.00s)

@tobiaszheller
Copy link
Copy Markdown
Contributor Author

LGTM but seems like there is now a broken test case:

=== RUN   TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter
{"caller":"athena/consumer.go:506","component":"athena","error":"operation error SQS: ReceiveMessage, failed to resolve service endpoint, an AWS region is required, but was not found","level":"error","message":"Failure processing SQS messages","timestamp":"2023-04-27T10:16:33Z"}
    service_test.go:322: 
        	Error Trace:	/__w/teleport/teleport/lib/service/service_test.go:322
        	            				/__w/teleport/teleport/lib/service/service_test.go:338
        	Error:      	Received unexpected error:
        	            	LimiterRefillAmount must be greater than 0 if LimiterBurst is used
        	Test:       	TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter
{"caller":"athena/consumer.go:506","component":"athena","error":"operation error SQS: ReceiveMessage, failed to resolve service endpoint, an AWS region is required, but was not found","level":"error","message":"Failure processing SQS messages","timestamp":"2023-04-27T10:16:33Z"}
--- FAIL: TestAthenaAuditLogSetup/config_with_rate_limit_-_should_use_events.SearchEventsLimiter (0.00s)

fixed in d7e491e

@public-teleport-github-review-bot public-teleport-github-review-bot Bot removed the request for review from lxea April 28, 2023 13:23
@tobiaszheller tobiaszheller force-pushed the tobiaszheller/auditevents-athena-ratelimit-query branch from d7e491e to d6c775c Compare April 28, 2023 13:30
@tobiaszheller tobiaszheller enabled auto-merge April 28, 2023 13:30
@tobiaszheller tobiaszheller added this pull request to the merge queue Apr 28, 2023
Merged via the queue into master with commit 29497f5 Apr 28, 2023
@tobiaszheller tobiaszheller deleted the tobiaszheller/auditevents-athena-ratelimit-query branch April 28, 2023 14:02
@public-teleport-github-review-bot
Copy link
Copy Markdown

@tobiaszheller See the table below for backport results.

Branch Result
branch/v13 Failed

@public-teleport-github-review-bot
Copy link
Copy Markdown

@tobiaszheller See the table below for backport results.

Branch Result
branch/v13 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants