Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(global-filters): Introduce global generic filters #3161

Merged
merged 20 commits into from
Mar 1, 2024

Conversation

iker-barriocanal
Copy link
Contributor

@iker-barriocanal iker-barriocanal commented Feb 23, 2024

This PR adds support for global generic filters.

Generic filters are a new schema of inbound filters with generic conditions to avoid creating a new filter for each use case.
Project configs already support project-specific generic filters. Global generic filters have been designed to be extendable, future-compatible, independent, and flexible to work well in isolation but also with project generic filters.

For event processing, Relay uses both the project config of the event and the global config. Before running the filters, Relay will merge the generic filters from the two configs and apply these to the event. If there's a match relay will drop the event and generate an outcome, or forward the event to the next processing step if there isn't.

Why

Currently, adding a new generic filter results in an increase in the size of project configs (config_size * number_of_filters). This can result in a significant increase in memory usage in all the services that store project configs (increase_per_project * number_of_projects), especially Relay. Incorporating generic filters into global configs allows Relay to apply these filters similarly with minimal overhead.

There are additional, less important goals that this approach also achieves: not increasing network usage, not impacting project config computation time, etc.

Previous work

Generic inbound filters were introduced in project configs in ee3036c. They are currently not enabled on sentry as supporting these filters in the global config was prioritized first.

Protocol

Global generic filters have the same protocol as project generic filters, but they exist in the global config. A new optional filters key is added as a top-level key in global configs. This object expects the following properties:

  • version: the version of global configs, for future-compatibility, as a u16.
  • filters: a list of generic filters sorted by descending priority, with the same protocol as with the project generic filters.

Example of a full protocol:

{
	// ... other global config keys ...
	"filters": {
		"version": 1,
		"filters": [
			{
				"id": "sample-errors",
				"isEnabled": true,
				"condition": {
					"op": "eq",
					"name": "event.exceptions",
					"value": "Sample Error"
				}
			}
		]
	}
}

Decisions and trade-offs

Protocol

Supporting external Relays with future compatibility to make updates safely:

  • Introduced version to skip filters on a version not supported by Relay.
  • When filters are unsupported, Relay forwards events upstream without extracting metrics. This allows the next updated Relay in the chain supporting the filters to run filtering and, in case there's a decision to keep the event, extract metrics from it.

Relay merges the configs prioritizing the project config. Each filter has an isEnabled flag to maximize flexibility and allow:

  • Defining a filter for a single project in its config.
  • Defining a filter for all projects in the global config.
  • Defining a filter for most projects by enabling it in the global config and disabling it in the applicable project configs.
  • Defining a filter for some projects by disabling it in the global config and enabling it in the applicable project configs.
    This is the simplest alternative to support all the use cases.

Relay implementation details

Relay types the filters internally as an ordered map with custom (de)serialization, keeping the order in the protocol, to benefit from:

  • O(1) retrieval of filters given the ID.
  • Removing duplicated filters.
    Removing duplicates is a prerequisite to merge the filters from both configs properly. Since the list of filters in the payload is expected to be sorted by descending priority, only the first occurrence of a filter is selected and the rest are discarded during deserialization. Discarded filters are not forwarded downstream.

filters is typed as an ErrorBoundary instead of an Option:

  • Errors during deserialization indicate breaking changes. Relay skips metric extraction and dynamic sampling in these cases, and the next relay in chain deals with them.
    • Note: processing relays will run dynamic sampling and extract metrics even if filters are broken.
  • An empty entry in a payload is deserialized as an empty list of filters, effectively a noop.
  • A list of filters in a supported version is appropriately deserialized and applied.

@iker-barriocanal iker-barriocanal self-assigned this Feb 23, 2024
/// ```
#[derive(Clone, Debug, Default, Deserialize, Serialize, PartialEq)]
#[serde(rename_all = "camelCase")]
pub struct GenericFiltersConfig {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is used in should_filter, which is public, should also be public.

relay-filter/src/config.rs Outdated Show resolved Hide resolved
Comment on lines 1918 to 1919
tx, _ = tx_consumer.get_event()
assert tx is not None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_event already asserts the transaction is None, but that's not super clear so I added the check here for readability.

@@ -1851,8 +1852,6 @@ def test_metric_bucket_encoding_dynamic_global_config_option(
namespace: "array"
}

print(mini_sentry.global_config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: investigate why this is not caught by automation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think that was me 🤣

/// Deserialization:
///
/// ```
/// # use relay_filter::GenericFiltersConfig;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add the (de)serialization tests as doctests as the description above may not be 100% clear and I was going to add tests anyway.

Copy link
Contributor

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like all your tests have max 2 filters in the config. Can we add at least 5 to each of the location and make sure to test that they are applied in according to the rules describes in this PR?

Comment on lines 356 to 364
/// Map of generic filters, sorted by the order in the payload from upstream.
///
/// The map contains unique filters, meaning there are no two filters with
/// the same id. See struct docs for more details.
#[serde(
deserialize_with = "deserialize_generic_filters",
serialize_with = "serialize_generic_filters"
)]
pub filters: IndexMap<String, GenericFilterConfig>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be already answered: Do we definitely want to keep the order of the rules? Does the order of the rules have some special meaning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rules are sorted by priority, so the order of the rules is preserved, and IndexMap ensures that (see docs):

The key-value pairs have a consistent order that is determined by the sequence of insertion and removal calls on the map.

relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
Comment on lines 653 to 669
if self.project_index < self.config.project_filters.filters.len() {
let filter = self.next_project_filter();
if filter.is_some() {
return filter;
}
continue;
}

if let Some(global_filters) = self.config.global_filters {
if self.global_index < global_filters.filters.len() {
let filter = self.next_global_filter();
if filter.is_some() {
return filter;
}
continue;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we have one filter from project config and then from global config, and next iteration the same?
Don't we want to iterate first all project config filters and only then all global config filters?

Or maybe I"m getting this wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop iterates through all project filters first, until project_index is too big and thus has evaluated all project filters. Then, the iterator moves on to global filters, and repeats the same approach. I added a small comment in 950325d to provide a bit more visibility.

I acknowledge this piece of code is not straightforward to understand, so I'm open to suggestions on how to make it more readable!

relay-filter/src/config.rs Outdated Show resolved Hide resolved
Changes include:
- Rename `CombinedFiltersConfig` -> `DynamicFiltersConfig` (same for the
  iterator) to align with `DynamicMeasurementsConfig`.
- Add unit test covering multiple filters from project and global
  filters.
- Add a brief comment in the iterator's loop to provide a bit more
  clarity into what the `loop` does.
@@ -26,6 +26,7 @@ insta = { version = "1.31.0", features = ["json", "redactions", "ron"] }
hash32 = "0.3.1"
hashbrown = "0.14.3"
itertools = "0.10.5"
indexmap = "2.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could bump the dependency to 2.2.3 (latest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do that in a follow-up if the new version introduces some nice features. I don't want to increase the scope of this PR as it's already big and complex enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would using the latest version increase the scope of this PR? It's a new dependency anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0.0 -> 2.2.3 should have no breaking changes, but there could be e.g. uncaught panics. If this is the case and we identify this PR as the root cause, narrowing the cause further to the dependency is more complicated. Separating it in a different PR is easier to spot, and I don't mind the extra bit of work.

relay-dynamic-config/src/global.rs Show resolved Hide resolved
relay-dynamic-config/src/global.rs Show resolved Hide resolved
relay-dynamic-config/src/global.rs Outdated Show resolved Hide resolved
relay-dynamic-config/src/global.rs Outdated Show resolved Hide resolved
tests/integration/test_filters.py Show resolved Hide resolved
@@ -1851,8 +1852,6 @@ def test_metric_bucket_encoding_dynamic_global_config_option(
namespace: "array"
}

print(mini_sentry.global_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think that was me 🤣

tests/integration/test_metrics.py Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor.rs Outdated Show resolved Hide resolved
relay-dynamic-config/src/global.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
impl<'a> Iterator for DynamicGenericFiltersConfigIter<'a> {
type Item = MergedFilter<'a>;

fn next(&mut self) -> Option<Self::Item> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating two iterators before impl Iterator makes the iterator implementation easier to understand, but adds unneeded complexity on typing these iterators in the struct: exact type of the iterator including the closure of the filtering that must be known at compile time and creating yet another struct is not worth it.

fn next(&mut self) -> Option<Self::Item> {
if let Some((id, filter)) = self.project.get_index(self.project_index) {
self.project_index += 1;
let merged = merge_filters(filter, self.global.and_then(|gf| gf.get(id)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest simplification from the previous revision is to not check is_enabled or condition. This increases the complexity of the iterator implementation enough to make it hard to understand, and dealing with these outside the iterator is not too hard. Thus, I'm not adding these checks now, and we can consider extending the abstraction in the future to deal with these, if necessary.

relay-filter/src/generic.rs Outdated Show resolved Hide resolved
relay-dynamic-config/src/global.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
skip_serializing_if = "IndexMap::is_empty",
with = "generic_filters_custom_serialization"
)]
pub filters: IndexMap<String, GenericFilterConfig>,
Copy link
Member

@Dav1dde Dav1dde Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this a newtype instead and hide the IndexMap which should make it impossible to construct an invalid index map.

Can also implement Serialize/Deserialize directly on the newtype (cannot forget to add the annotation)

Would also simplify tests, could just be constructed from a list/vector of filter configs again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d7790ea.

relay-filter/src/generic.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
@@ -26,6 +26,7 @@ insta = { version = "1.31.0", features = ["json", "redactions", "ron"] }
hash32 = "0.3.1"
hashbrown = "0.14.3"
itertools = "0.10.5"
indexmap = "2.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would using the latest version increase the scope of this PR? It's a new dependency anyway?

Cargo.toml Outdated Show resolved Hide resolved
/// Returns the generic inbound filters.
pub fn filters(&self) -> Option<&GenericFiltersConfig> {
match &self.filters {
ErrorBoundary::Err(_) => None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we filter nothing, right?

Not a strong opinion on this, because we can always bump the metrics extraction version, but Isn't the following scenario still relevant:

  1. We make a breaking change to the filter config.
  2. Outdated external relays stop filtering, and extract metrics from everything that was previously filtered out.

I'm also on board with just deciding that we'll never make a breaking change, but in that case, why do we need the error boundary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed in bfb8657.

Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An integration test which tests an invalid global config would be nice to make sure DS is skipped even with a valid project config.

relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/generic.rs Outdated Show resolved Hide resolved
relay-server/src/services/processor/event.rs Outdated Show resolved Hide resolved
// events.
// In processing relays, always extract metrics to avoid losing them.
let supported_generic_filters = self.inner.global_config.current().filters.is_ok()
&& relay_filter::are_generic_filters_supported(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, passing the global config here creates a circular dependency with crates.

relay-dynamic-config/src/global.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Outdated Show resolved Hide resolved
relay-filter/src/config.rs Show resolved Hide resolved
relay-filter/src/generic.rs Outdated Show resolved Hide resolved
relay-filter/src/generic.rs Outdated Show resolved Hide resolved
relay-filter/src/generic.rs Show resolved Hide resolved
relay-server/src/services/processor.rs Show resolved Hide resolved
tests/integration/test_metrics.py Outdated Show resolved Hide resolved
assert tx is not None
else:
assert mini_sentry.captured_events.get() is not None
metrics_consumer.assert_empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought processing relays always extract metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do! The test was inaccurate and there's something wrong with assert_empty. I adapted the test in 28ed597 to reflect the reality better.

relay-server/src/services/processor.rs Show resolved Hide resolved
// need to drop the event, and there should not be metrics from dropped
// events.
// In processing relays, always extract metrics to avoid losing them.
let supported_generic_filters = self.inner.global_config.current().filters.is_ok()
Copy link
Member

@Dav1dde Dav1dde Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think event::filter should return this, this is filter related logic and just hard to read here.

or hide it behind some function in the event module, then you can also pass the global config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll address this in a follow-up so that this PR can already land.

relay-server/src/services/processor.rs Show resolved Hide resolved
Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted before I think we can move the supported_generic_filters to a better place. Other than that :shipit:

@iker-barriocanal iker-barriocanal merged commit 2800e95 into master Mar 1, 2024
21 checks passed
@iker-barriocanal iker-barriocanal deleted the iker/feat/generic-inbound-filters branch March 1, 2024 11:55
iker-barriocanal added a commit that referenced this pull request Mar 4, 2024
Follow-up to
#3161 (comment).

Bumps workspace's `indexmap` from `2.0.0` to `2.2.5` (latest).

Note:
- `2.2.1` introduces a [breaking
change](https://github.com/indexmap-rs/indexmap/blob/master/RELEASES.md#221)
that should have no impact (the struct `RawOccupiedEntryMut` is not
used).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants