Conversation
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 1 changed, 0 removedBuild ID: 506da8e589f9b712b544cc15 URL: https://www.apollographql.com/docs/deploy-preview/506da8e589f9b712b544cc15 |
This comment has been minimized.
This comment has been minimized.
8e1bcde to
1a7bdb9
Compare
There was a problem hiding this comment.
Note: there is a TODO to finish this
There was a problem hiding this comment.
What are we doing with Other here? Is the String necessary for metrics? Or in other words, what's the expectation for features that router doesn't know about and how many of those do we expect from a typical response?
There was a problem hiding this comment.
I saw the String primarily for use with logging so that, for example, if a customer tried to use a new feature with an older version of the router and didn’t see it take effect, the log message could guide them to the appropriate action - updating their router.
newer jwt - older router
If the router encounters a feature without its own variant in AllowedFeature it will map to Other, the router will ignore it and log at the warn level that it does not recognize it and the customer should consider updating their version of the router.
older jwt - newer router
If the jwt allowed_features's claim does not contain the feature they want to use with their router then that feature will show up in the license report as a restricted feature and the router either won't start up or reload.
The metrics piece & how many we expect are good call outs:
Do we want any metrics around this?
My first thought is no - this variant will be created when a customer is using a feature unsupported by their version of the router
On the other hand, if we want to know what features customers are trying to use, albeit unsuccessfully, this might be something we want to record and keep on our radar to ensure we are communicating correctly to customers what min version of the router they need to use said feature(s).
How many Other's do we expect to see?
- This is dependent on how many new features we plan to introduce and how well we document the min version of the router they need (older router - newer jwt above)
- We may also see this if a customer has an outdated jwt (older jwt - newer router above)
There was a problem hiding this comment.
if a customer tried to use a new feature with an older version of the router and didn’t see it take effect, the log message could guide them to the appropriate action - updating their router.
Are they manually opting in to allowed_features in GraphOS UI? If not, it might be a bit confusing reading a message saying "new_feature you're not using nor have you opted into is not available on your version of the router". We will also not be able to add a message in the router saying in which version this new feature is available, unless we literally backport to every version of the router after this change lands. So that information, that is minimum router version, will then have to live somewhere in GraphOS and make its way to the router along with the actual feature name. If they are opting into allowed_features, then this makes a bit more sense.
If the jwt allowed_features's claim does not contain the feature they want to use with their router then that feature will show up in the license report as a restricted feature and the router either won't start up or reload.
I did not yet see the logic for this in the code here. Right now it seems there is only a warn, and a plugin does not load (which is a nice experience - you can still use the router - but I am not sure if that's the intention). Or is that more for configuration based entitlements? Speaking of, while I am usually a proponent of separating PRs into smallest possible units of work, it might be easier to bring this PR and #7939 together, especially when you're testing. You are going to want to test the intersection of what happens when there isn't any config for a licensed feature (i.e. default config) but the plugin still shouldn't load because there isn't a license etc.
On the other hand, if we want to know what features customers are trying to use, albeit unsuccessfully, this might be something we want to record and keep on our radar to ensure we are communicating correctly to customers what min version of the router they need to use said feature(s).
That sounds like we should have a metric recording the Other and the current router version. This could form a helpful UI in GraphOS letting users know they should update their router and would also just be helpful information for us to know.
There was a problem hiding this comment.
Are they manually opting in to allowed_features in GraphOS UI?
We may have this eventually but for now we don't the ability to do th is in GraphOS UI. I am thinking we can add the warning to the jwt.
I did not yet see the logic for this in the code here
My apologies, as you had mentioned it is in #7939. I brought the contents of #7939 to this PR to make it easier to understand everything as a whole
That sounds like we should have a metric recording the Other and the current router version.
This sounds very interesting! I will think about how we can add this and what it would look like in terms of UI for our users
1a7bdb9 to
8bb665b
Compare
lrlna
left a comment
There was a problem hiding this comment.
I had a few small comments, but overall I think the direction is good. We will definitely need a bunch of tests, especially integration tests, for this though.
apollo-router/src/router_factory.rs
Outdated
| if let Some(allowed_features) = $license.get_allowed_features() { | ||
| if allowed_features.contains(&AllowedFeature::from($name)) { | ||
| add_plugin!(name.to_string(), factory, plugin_config, full_config); | ||
| } else { | ||
| tracing::warn!( | ||
| "{name} is a restricted feature that requires a license" | ||
| ); | ||
| } |
There was a problem hiding this comment.
When testing, we should check the behaviour for Other (Undefined) feature. It might be strange to send a warn for a feature that very clearly doesn't exist in the router.
There was a problem hiding this comment.
What are we doing with Other here? Is the String necessary for metrics? Or in other words, what's the expectation for features that router doesn't know about and how many of those do we expect from a typical response?
| /// Allowed features for a License, representing what's available to a particular pricing tier | ||
| #[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, Hash)] | ||
| pub enum AllowedFeature { |
There was a problem hiding this comment.
Parts of telemetry are a license-only feature (custom instruments). How is this going to be handled by the the allowlist?
There was a problem hiding this comment.
Thanks for calling this out, I was not aware of this. One idea is to add another item to this enum. I will explore this more to understand what parts of telemetry are license-only
|
@DMallare @aaronArinder I'll let you all continue working on this for the time being, but give me a shout once this is out of draft and ready for review again. |
apollo-router/src/router_factory.rs
Outdated
| if let Some(allowed_features) = $license.get_allowed_features() { | ||
| if allowed_features.contains(&AllowedFeature::from($name)) { | ||
| add_plugin!(name.to_string(), factory, plugin_config, full_config); | ||
| } else { | ||
| tracing::warn!( | ||
| "{name} plugin is not registered, {name} is a restricted feature that requires a license" | ||
| ); | ||
| } |
There was a problem hiding this comment.
this will come out in testing, but the allowed features are probably named differently than their plugins (some get prefixed with a apollo. for example); so, we'll want to make sure there's some mapping between the two that's testable
There was a problem hiding this comment.
Are we sure they get prefixed with apollo? We add the prefix in our macro here so after that step we would have apollo.apollo.some-plugin-name.
I agree with the mapping part. My plan was to write unit tests to ensure each plugin associated with an allowed feature was added when included in the config + allowed_features set and also no added if not included in the config and/ or not in the allowed_features set. I also want to add in the doc comments for the AllowedFeature enum what plugin or config etc. it maps to for documentation purposes
apollo-router/src/router_factory.rs
Outdated
| // If the license has no allowed_features claim, we're using a pricing plan | ||
| // that should have the plugin enabled regardless | ||
| } else { | ||
| add_plugin!(name.to_string(), factory, plugin_config, full_config); |
There was a problem hiding this comment.
we'll also want to spend some time (probably through tests) in figuring out which plugins are potentially added in other ways; the telemetry plugin gets some special handling, but is that the only one?
2cf618c to
58ad54b
Compare
6f0253a to
4b6918b
Compare
5adad91 to
7c7b53c
Compare
* product has a set of features they'd like to be free; this pr introduces them and fixes tests
…ederationContextArguments from AllowedFeature
goto-bus-stop
left a comment
There was a problem hiding this comment.
Great to see so many tests, I think it all looks good. I do think we should find a different solution than the feature flag, as it will affect everyone doing anything on the router going forward.
| #[cfg(not(feature = "test-jwks"))] | ||
| let jwks = re.replace(include_str!("license.jwks.json"), ""); | ||
| #[cfg(feature = "test-jwks")] | ||
| let jwks = re.replace(include_str!("testdata/license.jwks.json"), ""); |
There was a problem hiding this comment.
Integration tests spawn a router subprocess. Could this use an environment variable at runtime instead of a separate feature? eg. TEST_INTERNAL_APOLLO_UPLINK_JWKS=path/to/testdata/license.jwks.json?
I think running different test suites based on feature flagging is going to be brittle to maintain going forward.
There was a problem hiding this comment.
Here is where the integration testing harness can set env variables for the router subprocess. It doesn't have methods to do so right now, but you could add env: HashMap<String, String> to the IntegrationTest constructor, or even have a hardcoded .with_test_jwks() method if it's easier
router/apollo-router/tests/common.rs
Line 713 in 8ac0bd9
There was a problem hiding this comment.
good call out; and, good thinking for isolating this to just the relevant test; pushed a commit that I think does the dirt, but let us know
goto-bus-stop
left a comment
There was a problem hiding this comment.
as discussed with @aaronArinder, will pre-approve so you can merge it after europe EOD, after changing the feature flag thing 😇 thanksss
.circleci/config.yml
Outdated
| cargo xtask test --workspace --locked --features ci,snapshot | ||
|
|
||
| # run just the allowed_features integration tests with a dummy JWKS endpoint | ||
| APOLLO_TEST_INTERNAL_UPLINK_JWKS=true cargo xtask test --workspace --locked --features ci,snapshot,test-jwks integration_tests |
There was a problem hiding this comment.
tell me if i'm missing something, but it should be possible to do everything in a single test run by configuring the env var only inside the integration tests that need it
1a5a06a to
6194192
Compare
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
(aaron while danielle is ooo) We feel confident in this work (given the integration tests and some manual e2e testing that's still underway (see here for the first round), but we want to make sure that:
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩