Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CapMan visibility] Increase the default throttle/warning threshold for allocation policies #6189

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

xurui-c
Copy link
Member

@xurui-c xurui-c commented Aug 8, 2024

Allocation policies are our mechanism for doing traffic management for Snuba queries. Currently, we see a lot of warnings (Warning: Query from referrer ... is throttled) on Sentry Issues. This is because ~17% of queries to Snuba are throttled by capacity management/allocation policies. This is an overkill since we actually have a lot of Clickhouse capacity.

This PR increases the default throttle/warning threshold to be higher than half of the rejection threshold. I will also increase the thresholds in Snuba Admin.

@xurui-c xurui-c requested a review from a team as a code owner August 8, 2024 21:08
DEFAULT_BYTES_THROTTLE_DIVIDER,
),
AllocationPolicyConfig(
"threads_throttle_divider",
"max threads divided by this number is the number of threads we use to execute queries for a throttled (project_id|organization_id, referrer)",
int,
DEFAULT_BYTES_THROTTLE_DIVIDER,
DEFAULT_THREADS_THROTTLE_DIVIDER,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixes an existing bug

Copy link
Member

@MeredithAnya MeredithAnya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's update the test as well

policy.set_config_value("bytes_throttle_divider", 2)

Copy link

codecov bot commented Aug 8, 2024

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 2518 tests with 1 failed, 2513 passed and 4 skipped.

View the full list of failed tests

pytest

  • Class name: tests.web.test_db_query
    Test name: test_db_query_success

    Traceback (most recent call last):
    File ".../tests/web/test_db_query.py", line 302, in test_db_query_success
    assert stats["quota_allowance"] == {
    AssertionError: assert {'details': {'BytesScannedRejectingPolicy': {'can_run': True,\n 'explanation': {'reason': 'within_limit '\n 'but '\n 'throttled',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': True,\n 'max_threads': 5,\n 'quota_unit': 'bytes',\n 'quota_used': 1560000000000,\n 'rejection_threshold': 2560000000000,\n 'suggestion': 'The feature, '\n 'organization/project '\n 'is scanning too '\n 'many bytes, this '\n 'usually means they '\n 'are abusing that '\n 'API',\n 'throttle_threshold': 1706666666666},\n 'BytesScannedWindowAllocationPolicy': {'can_run': True,\n 'explanation': {'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'bytes',\n 'quota_used': 0,\n 'rejection_threshold': 1000000000000,\n 'suggestion': 'The '\n 'feature, '\n 'organization/project '\n 'is scanning '\n 'too many '\n 'bytes, this '\n 'usually '\n 'means they '\n 'are abusing '\n 'that API',\n 'throttle_threshold': 10000000},\n 'ConcurrentRateLimitAllocationPolicy': {'can_run': True,\n 'explanation': {'overrides': {},\n 'reason': 'within '\n 'limit',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'concurrent_queries',\n 'quota_used': 1,\n 'rejection_threshold': 22,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 22},\n 'CrossOrgQueryAllocationPolicy': {'can_run': True,\n 'explanation': {'reason': 'pass_through',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'no_units',\n 'quota_used': 0,\n 'rejection_threshold': 1000000000000,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 1000000000000},\n 'ReferrerGuardRailPolicy': {'can_run': True,\n 'explanation': {'policy': 'referrer_guard_rail_policy',\n 'reason': 'within '\n 'limit',\n 'referrer': 'something',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'concurrent_queries',\n 'quota_used': 1,\n 'rejection_threshold': 100,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 66}},\n 'summary': {'rejected_by': {},\n 'threads_used': 5,\n 'throttled_by': {'policy': 'BytesScannedRejectingPolicy',\n 'quota_unit': 'bytes',\n 'quota_used': 1560000000000,\n 'suggestion': 'The feature, organization/project '\n 'is scanning too many bytes, this '\n 'usually means they are abusing '\n 'that API',\n 'throttle_threshold': 1706666666666}}} == {'details': {'BytesScannedRejectingPolicy': {'can_run': True,\n 'explanation': {'reason': 'within_limit '\n 'but '\n 'throttled',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': True,\n 'max_threads': 5,\n 'quota_unit': 'bytes',\n 'quota_used': 1560000000000,\n 'rejection_threshold': 2560000000000,\n 'suggestion': 'The feature, '\n 'organization/project '\n 'is scanning too '\n 'many bytes, this '\n 'usually means they '\n 'are abusing that '\n 'API',\n 'throttle_threshold': 1706666666666},\n 'BytesScannedWindowAllocationPolicy': {'can_run': True,\n 'explanation': {'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'bytes',\n 'quota_used': 0,\n 'rejection_threshold': 1000000000000,\n 'suggestion': 'The '\n 'feature, '\n 'organization/project '\n 'is scanning '\n 'too many '\n 'bytes, this '\n 'usually '\n 'means they '\n 'are abusing '\n 'that API',\n 'throttle_threshold': 10000000},\n 'ConcurrentRateLimitAllocationPolicy': {'can_run': True,\n 'explanation': {'overrides': {},\n 'reason': 'within '\n 'limit',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'concurrent_queries',\n 'quota_used': 1,\n 'rejection_threshold': 22,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 22},\n 'CrossOrgQueryAllocationPolicy': {'can_run': True,\n 'explanation': {'reason': 'pass_through',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'no_units',\n 'quota_used': 0,\n 'rejection_threshold': 1000000000000,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 1000000000000},\n 'ReferrerGuardRailPolicy': {'can_run': True,\n 'explanation': {'policy': 'referrer_guard_rail_policy',\n 'reason': 'within '\n 'limit',\n 'referrer': 'something',\n 'storage_key': 'StorageKey.ERRORS_RO'},\n 'is_throttled': False,\n 'max_threads': 10,\n 'quota_unit': 'concurrent_queries',\n 'quota_used': 1,\n 'rejection_threshold': 100,\n 'suggestion': 'no_suggestion',\n 'throttle_threshold': 50}},\n 'summary': {'rejected_by': {},\n 'threads_used': 5,\n 'throttled_by': {'policy': 'BytesScannedRejectingPolicy',\n 'quota_unit': 'bytes',\n 'quota_used': 1560000000000,\n 'suggestion': 'The feature, organization/project '\n 'is scanning too many bytes, this '\n 'usually means they are abusing '\n 'that API',\n 'throttle_threshold': 1706666666666}}}
    Common items:
    {'summary': {'rejected_by': {},
    'threads_used': 5,
    'throttled_by': {'policy': 'BytesScannedRejectingPolicy',
    'quota_unit': 'bytes',
    'quota_used': 1560000000000,
    'suggestion': 'The feature, organization/project '
    'is scanning too many bytes, this '
    'usually means they are abusing '
    'that API',
    'throttle_threshold': 1706666666666}}}
    Differing items:
    {'details': {'BytesScannedRejectingPolicy': {'can_run': True, 'explanation': {'reason': 'within_limit but throttled', ...'reason': 'pass_through', 'storage_key': 'StorageKey.ERRORS_RO'}, 'is_throttled': False, 'max_threads': 10, ...}, ...}} != {'details': {'BytesScannedRejectingPolicy': {'can_run': True, 'explanation': {'reason': 'within_limit but throttled', ...'reason': 'pass_through', 'storage_key': 'StorageKey.ERRORS_RO'}, 'is_throttled': False, 'max_threads': 10, ...}, ...}}
    Full diff:
    {
    'details': {'BytesScannedRejectingPolicy': {'can_run': True,
    'explanation': {'reason': 'within_limit '
    'but '
    'throttled',
    'storage_key': 'StorageKey.ERRORS_RO'},
    'is_throttled': True,
    'max_threads': 5,
    'quota_unit': 'bytes',
    'quota_used': 1560000000000,
    'rejection_threshold': 2560000000000,
    'suggestion': 'The feature, '
    'organization/project '
    'is scanning too '
    'many bytes, this '
    'usually means they '
    'are abusing that '
    'API',
    'throttle_threshold': 1706666666666},
    'BytesScannedWindowAllocationPolicy': {'can_run': True,
    'explanation': {'storage_key': 'StorageKey.ERRORS_RO'},
    'is_throttled': False,
    'max_threads': 10,
    'quota_unit': 'bytes',
    'quota_used': 0,
    'rejection_threshold': 1000000000000,
    'suggestion': 'The '
    'feature, '
    'organization/project '
    'is scanning '
    'too many '
    'bytes, this '
    'usually '
    'means they '
    'are abusing '
    'that API',
    'throttle_threshold': 10000000},
    'ConcurrentRateLimitAllocationPolicy': {'can_run': True,
    'explanation': {'overrides': {},
    'reason': 'within '
    'limit',
    'storage_key': 'StorageKey.ERRORS_RO'},
    'is_throttled': False,
    'max_threads': 10,
    'quota_unit': 'concurrent_queries',
    'quota_used': 1,
    'rejection_threshold': 22,
    'suggestion': 'no_suggestion',
    'throttle_threshold': 22},
    'CrossOrgQueryAllocationPolicy': {'can_run': True,
    'explanation': {'reason': 'pass_through',
    'storage_key': 'StorageKey.ERRORS_RO'},
    'is_throttled': False,
    'max_threads': 10,
    'quota_unit': 'no_units',
    'quota_used': 0,
    'rejection_threshold': 1000000000000,
    'suggestion': 'no_suggestion',
    'throttle_threshold': 1000000000000},
    'ReferrerGuardRailPolicy': {'can_run': True,
    'explanation': {'policy': 'referrer_guard_rail_policy',
    'reason': 'within '
    'limit',
    'referrer': 'something',
    'storage_key': 'StorageKey.ERRORS_RO'},
    'is_throttled': False,
    'max_threads': 10,
    'quota_unit': 'concurrent_queries',
    'quota_used': 1,
    'rejection_threshold': 100,
    'suggestion': 'no_suggestion',
    - 'throttle_threshold': 50}},
    & ^^
    + 'throttle_threshold': 66}},
    & ^^
    'summary': {'rejected_by': {},
    'threads_used': 5,
    'throttled_by': {'policy': 'BytesScannedRejectingPolicy',
    'quota_unit': 'bytes',
    'quota_used': 1560000000000,
    'suggestion': 'The feature, organization/project '
    'is scanning too many bytes, this '
    'usually means they are abusing '
    'that API',
    'throttle_threshold': 1706666666666}},
    }

@xurui-c xurui-c force-pushed the rachel/noWarnings branch 2 times, most recently from 3fae75d to 123a334 Compare August 8, 2024 22:31
@onewland
Copy link
Contributor

onewland commented Aug 9, 2024

IMO the title for this PR (and merge commit message, when it comes time for that) should be changed.

This doesn't "get rid of sentry warnings" for throttled queries, it adjusts the threshold in the hopes of reducing the quantity of throttles and, downstream of that, the quantity of warnings we emit.

The code change itself is fine, though.

@xurui-c xurui-c changed the title Get rid of Sentry warnings for throttled queries Increase the default throttle/warning threshold for allocation policies Aug 9, 2024
@xurui-c xurui-c merged commit 52cd1ed into master Aug 9, 2024
30 checks passed
@xurui-c xurui-c deleted the rachel/noWarnings branch August 9, 2024 16:54
@xurui-c xurui-c changed the title Increase the default throttle/warning threshold for allocation policies [CapMan visibility] Increase the default throttle/warning threshold for allocation policies Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants