Skip to content

Enforcing allowed features#7917

Merged
DMallare merged 36 commits intodevfrom
dmallare/enforcing-allowed-features-plugins
Aug 20, 2025
Merged

Enforcing allowed features#7917
DMallare merged 36 commits intodevfrom
dmallare/enforcing-allowed-features-plugins

Conversation

@DMallare
Copy link
Contributor

@DMallare DMallare commented Jul 17, 2025


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

(aaron while danielle is ooo) We feel confident in this work (given the integration tests and some manual e2e testing that's still underway (see here for the first round), but we want to make sure that:

  • our changes to the plugin ecosystem seem judicious (we introduced a new macro, eg, but overall this file felt unwieldy--special attention here to make sure we didn't F anything up would be much appreciated)
  • that the new jwks endpoint doesn't seem like a mistake (enables integration testing but might be unobvious to folks)
  • that the allowed features integration tests seem sensible, comprehensive, and appropriate (ie, de-risks this work by making you/us feel confident--I feel pretty confident in them, but maybe I'm missing something)

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link

apollo-librarian bot commented Jul 17, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 1 changed, 0 removed
* graphos/routing/(latest)/observability/telemetry/instrumentation/standard-instruments.mdx

Build ID: 506da8e589f9b712b544cc15
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/506da8e589f9b712b544cc15

@github-actions

This comment has been minimized.

@DMallare DMallare requested a review from aaronArinder July 17, 2025 21:58
@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch from 8e1bcde to 1a7bdb9 Compare July 18, 2025 18:58
@DMallare DMallare mentioned this pull request Jul 18, 2025
10 tasks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there is a TODO to finish this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we doing with Other here? Is the String necessary for metrics? Or in other words, what's the expectation for features that router doesn't know about and how many of those do we expect from a typical response?

Copy link
Contributor Author

@DMallare DMallare Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the String primarily for use with logging so that, for example, if a customer tried to use a new feature with an older version of the router and didn’t see it take effect, the log message could guide them to the appropriate action - updating their router.

newer jwt - older router
If the router encounters a feature without its own variant in AllowedFeature it will map to Other, the router will ignore it and log at the warn level that it does not recognize it and the customer should consider updating their version of the router.

older jwt - newer router
If the jwt allowed_features's claim does not contain the feature they want to use with their router then that feature will show up in the license report as a restricted feature and the router either won't start up or reload.

The metrics piece & how many we expect are good call outs:

Do we want any metrics around this?
My first thought is no - this variant will be created when a customer is using a feature unsupported by their version of the router
On the other hand, if we want to know what features customers are trying to use, albeit unsuccessfully, this might be something we want to record and keep on our radar to ensure we are communicating correctly to customers what min version of the router they need to use said feature(s).

How many Other's do we expect to see?

  • This is dependent on how many new features we plan to introduce and how well we document the min version of the router they need (older router - newer jwt above)
  • We may also see this if a customer has an outdated jwt (older jwt - newer router above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a customer tried to use a new feature with an older version of the router and didn’t see it take effect, the log message could guide them to the appropriate action - updating their router.

Are they manually opting in to allowed_features in GraphOS UI? If not, it might be a bit confusing reading a message saying "new_feature you're not using nor have you opted into is not available on your version of the router". We will also not be able to add a message in the router saying in which version this new feature is available, unless we literally backport to every version of the router after this change lands. So that information, that is minimum router version, will then have to live somewhere in GraphOS and make its way to the router along with the actual feature name. If they are opting into allowed_features, then this makes a bit more sense.

If the jwt allowed_features's claim does not contain the feature they want to use with their router then that feature will show up in the license report as a restricted feature and the router either won't start up or reload.

I did not yet see the logic for this in the code here. Right now it seems there is only a warn, and a plugin does not load (which is a nice experience - you can still use the router - but I am not sure if that's the intention). Or is that more for configuration based entitlements? Speaking of, while I am usually a proponent of separating PRs into smallest possible units of work, it might be easier to bring this PR and #7939 together, especially when you're testing. You are going to want to test the intersection of what happens when there isn't any config for a licensed feature (i.e. default config) but the plugin still shouldn't load because there isn't a license etc.

On the other hand, if we want to know what features customers are trying to use, albeit unsuccessfully, this might be something we want to record and keep on our radar to ensure we are communicating correctly to customers what min version of the router they need to use said feature(s).

That sounds like we should have a metric recording the Other and the current router version. This could form a helpful UI in GraphOS letting users know they should update their router and would also just be helpful information for us to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they manually opting in to allowed_features in GraphOS UI?

We may have this eventually but for now we don't the ability to do th is in GraphOS UI. I am thinking we can add the warning to the jwt.

I did not yet see the logic for this in the code here

My apologies, as you had mentioned it is in #7939. I brought the contents of #7939 to this PR to make it easier to understand everything as a whole

That sounds like we should have a metric recording the Other and the current router version.

This sounds very interesting! I will think about how we can add this and what it would look like in terms of UI for our users

@DMallare DMallare mentioned this pull request Jul 20, 2025
10 tasks
@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch from 1a7bdb9 to 8bb665b Compare July 21, 2025 18:30
@lrlna lrlna self-requested a review July 22, 2025 11:29
Copy link
Member

@lrlna lrlna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few small comments, but overall I think the direction is good. We will definitely need a bunch of tests, especially integration tests, for this though.

Comment on lines +638 to +663
if let Some(allowed_features) = $license.get_allowed_features() {
if allowed_features.contains(&AllowedFeature::from($name)) {
add_plugin!(name.to_string(), factory, plugin_config, full_config);
} else {
tracing::warn!(
"{name} is a restricted feature that requires a license"
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing, we should check the behaviour for Other (Undefined) feature. It might be strange to send a warn for a feature that very clearly doesn't exist in the router.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we doing with Other here? Is the String necessary for metrics? Or in other words, what's the expectation for features that router doesn't know about and how many of those do we expect from a typical response?

Comment on lines +518 to +640
/// Allowed features for a License, representing what's available to a particular pricing tier
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, Hash)]
pub enum AllowedFeature {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts of telemetry are a license-only feature (custom instruments). How is this going to be handled by the the allowlist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out, I was not aware of this. One idea is to add another item to this enum. I will explore this more to understand what parts of telemetry are license-only

@lrlna
Copy link
Member

lrlna commented Jul 23, 2025

@DMallare @aaronArinder I'll let you all continue working on this for the time being, but give me a shout once this is out of draft and ready for review again.

@DMallare DMallare changed the title poc: enforcing allowed features - plugins poc: enforcing allowed features Jul 23, 2025
Comment on lines +638 to +663
if let Some(allowed_features) = $license.get_allowed_features() {
if allowed_features.contains(&AllowedFeature::from($name)) {
add_plugin!(name.to_string(), factory, plugin_config, full_config);
} else {
tracing::warn!(
"{name} plugin is not registered, {name} is a restricted feature that requires a license"
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will come out in testing, but the allowed features are probably named differently than their plugins (some get prefixed with a apollo. for example); so, we'll want to make sure there's some mapping between the two that's testable

Copy link
Contributor Author

@DMallare DMallare Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure they get prefixed with apollo? We add the prefix in our macro here so after that step we would have apollo.apollo.some-plugin-name.

I agree with the mapping part. My plan was to write unit tests to ensure each plugin associated with an allowed feature was added when included in the config + allowed_features set and also no added if not included in the config and/ or not in the allowed_features set. I also want to add in the doc comments for the AllowedFeature enum what plugin or config etc. it maps to for documentation purposes

// If the license has no allowed_features claim, we're using a pricing plan
// that should have the plugin enabled regardless
} else {
add_plugin!(name.to_string(), factory, plugin_config, full_config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll also want to spend some time (probably through tests) in figuring out which plugins are potentially added in other ways; the telemetry plugin gets some special handling, but is that the only one?

@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch 10 times, most recently from 2cf618c to 58ad54b Compare August 5, 2025 13:41
@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch from 6f0253a to 4b6918b Compare August 5, 2025 16:28
@DMallare DMallare changed the title poc: enforcing allowed features Enforcing allowed features Aug 7, 2025
@aaronArinder aaronArinder marked this pull request as ready for review August 8, 2025 21:00
@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch from 5adad91 to 7c7b53c Compare August 18, 2025 15:40
Copy link
Member

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see so many tests, I think it all looks good. I do think we should find a different solution than the feature flag, as it will affect everyone doing anything on the router going forward.

#[cfg(not(feature = "test-jwks"))]
let jwks = re.replace(include_str!("license.jwks.json"), "");
#[cfg(feature = "test-jwks")]
let jwks = re.replace(include_str!("testdata/license.jwks.json"), "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests spawn a router subprocess. Could this use an environment variable at runtime instead of a separate feature? eg. TEST_INTERNAL_APOLLO_UPLINK_JWKS=path/to/testdata/license.jwks.json?

I think running different test suites based on feature flagging is going to be brittle to maintain going forward.

Copy link
Member

@goto-bus-stop goto-bus-stop Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where the integration testing harness can set env variables for the router subprocess. It doesn't have methods to do so right now, but you could add env: HashMap<String, String> to the IntegrationTest constructor, or even have a hardcoded .with_test_jwks() method if it's easier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out; and, good thinking for isolating this to just the relevant test; pushed a commit that I think does the dirt, but let us know

Copy link
Member

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed with @aaronArinder, will pre-approve so you can merge it after europe EOD, after changing the feature flag thing 😇 thanksss

cargo xtask test --workspace --locked --features ci,snapshot

# run just the allowed_features integration tests with a dummy JWKS endpoint
APOLLO_TEST_INTERNAL_UPLINK_JWKS=true cargo xtask test --workspace --locked --features ci,snapshot,test-jwks integration_tests
Copy link
Member

@goto-bus-stop goto-bus-stop Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tell me if i'm missing something, but it should be possible to do everything in a single test run by configuring the env var only inside the integration tests that need it

@DMallare DMallare force-pushed the dmallare/enforcing-allowed-features-plugins branch from 1a5a06a to 6194192 Compare August 20, 2025 03:32
@DMallare DMallare merged commit 4e47dea into dev Aug 20, 2025
15 checks passed
@DMallare DMallare deleted the dmallare/enforcing-allowed-features-plugins branch August 20, 2025 04:53
@lrlna lrlna mentioned this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants