Skip to content

jwt_authn: RemoteJwks to support RetryPolicy config #16319#16924

Merged
lizan merged 12 commits intoenvoyproxy:mainfrom
alichnewsky:16319-jwks-async-fetcher-retries-signed-off
Jul 14, 2021
Merged

jwt_authn: RemoteJwks to support RetryPolicy config #16319#16924
lizan merged 12 commits intoenvoyproxy:mainfrom
alichnewsky:16319-jwks-async-fetcher-retries-signed-off

Conversation

@alichnewsky
Copy link
Contributor

Signed-off-by: Anthony Lichnewsky alichnewsky@users.noreply.github.com

Commit Message: jwt_authn: RemoteJwks to support RetryPolicy config #16319 ( only for background fetches failing )
Additional Description: following up on comments and code pruned from background jwks refresh mechanism from #16298.
Risk Level: Low, default number of retries set to zero. truncated exponential backoff requires explicit configuration.
Testing: Only mock-based unit tests so far. Haven't been able to get it ( or the background jwks fetch option, for that matter ) built in esp-v2 so far.
Docs Changes:
Release Notes:
Platform Specific Features:
Fixes #16319

@alichnewsky alichnewsky requested a review from lizan as a code owner June 10, 2021 09:15
@repokitteh-read-only
Copy link

Hi @alichnewsky, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #16924 was opened by alichnewsky.

see: more, trace.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #16924 was opened by alichnewsky.

see: more, trace.

@alichnewsky
Copy link
Contributor Author

this is really code that @qiwzhang was already looking at.
I am not married to this implementation, my goal is to get the ball running.

The issue of missing retries was encountered during load tests on services proxied behind Google Cloud Platform's ESPv2

@markdroth
Copy link
Contributor

/lgtm api

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api
/wait for impl

qiwzhang
qiwzhang previously approved these changes Jun 24, 2021
Copy link
Contributor

@qiwzhang qiwzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. Thank you very much!

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late on this.

While this shares same config from router's RetryPolicy, but we're not on same implementations. The HTTP Async Client specifies a NullRetryPolicy here, the optimal way should be allow setting a RetryPolicy in HTTP Async Client, instead of having another implementation and test.

@lizan lizan added the waiting label Jun 30, 2021
Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
@alichnewsky alichnewsky force-pushed the 16319-jwks-async-fetcher-retries-signed-off branch from 6d19fcf to 0d297b1 Compare July 5, 2021 13:10
@alichnewsky
Copy link
Contributor Author

I force-pushed into this PR the new branch implementing this PR the way @lizan and @qiwzhang suggested.
It looks like it broke CI since the logs from the azure devops pipeline are consistent with the wrong code being checked out.
Most likely some relying on a cache ...

can you please advise?
should I just create another PR ?

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
…amespace as per code style guidelines

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
@alichnewsky
Copy link
Contributor Author

I force-pushed into this PR the new branch implementing this PR the way @lizan and @qiwzhang suggested.
It looks like it broke CI since the logs from the azure devops pipeline are consistent with the wrong code being checked out.
Most likely some relying on a cache ...

can you please advise?
should I just create another PR ?

the last requested changes have been pushed on top of the same branch after the forced update, and the azure pipeline still fails for similar reasons ( most likely the ci scripts can't figure out where to rebase because a force-push occurred? )
clearly it complains about an error in a file I modified with a line number that would make sense only if the file didn't contain my modifications.

Please advise.

qiwzhang
qiwzhang previously approved these changes Jul 8, 2021
Copy link
Contributor

@qiwzhang qiwzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This LGTM with just one nit

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
@lizan
Copy link
Member

lizan commented Jul 13, 2021

@alichnewsky The CI failure is real, can you try merge main and see if it fails locally? It might be another merged PR affect it.

…ies-via-async-client-route-retry-policy

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
@alichnewsky
Copy link
Contributor Author

@alichnewsky The CI failure is real, can you try merge main and see if it fails locally? It might be another merged PR affect it.

you are indeed correct. my bad. correction is trivial, but a some code outside the scope of this PR does not pass spellchecking, formatting, ?clang-tidy? pre-push checks.

@alichnewsky
Copy link
Contributor Author

@alichnewsky The CI failure is real, can you try merge main and see if it fails locally? It might be another merged PR affect it.

you are indeed correct. my bad. correction is trivial, but a some code outside the scope of this PR does not pass spellchecking, formatting, ?clang-tidy? pre-push checks.

it may take me a couple of days to figure out how to fix these ...
stuff like this:

Checking format for test/extensions/filters/http/wasm/test_data/test_body_cpp.cc - ERROR: From ./test/extensions/filters/http/wasm/test_data/test_body_cpp.cc
ERROR: ./test/extensions/filters/http/wasm/test_data/test_body_cpp.cc:16: Don't use std::string_view or toStdStringView; use absl::string_view instead
ERROR: check format failed. run 'tools/code_format/check_format.py fix'
error: failed to push some refs to 'https://github.com/alichnewsky/envoy'

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
@alichnewsky
Copy link
Contributor Author

at this point, I have pushed all the changes I intended to make, and the azure pipeline still fails ( various timeouts, after 3h).

I am not authorized to re-run the failed jobs... The only thing I can do is add a comment somewhere push that and rebuild everything again, which hardly sounds correct.

Please advise.

@lizan
Copy link
Member

lizan commented Jul 14, 2021

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #16924 (comment) was created by @lizan.

see: more, trace.

@alichnewsky
Copy link
Contributor Author

alichnewsky commented Jul 14, 2021

still the same issue it takes more than 3h just to pass the unit tests on mac.. so the tests get cancelled.
I wouldn't mind adding a timeout=300 parameter here, but I am not even sure that would do the trick.

Alternatively, I could just try to retest as you did every now and then and just hope some ci agent host gets less busy at some point ?

@lizan lizan merged commit b62dae2 into envoyproxy:main Jul 14, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
…nvoyproxy#16924)

Commit Message: jwt_authn: RemoteJwks to support RetryPolicy config envoyproxy#16319 ( only for background fetches failing )
Additional Description: following up on comments and code pruned from background jwks refresh mechanism from envoyproxy#16912

Risk Level: Low, default number of retries set to zero. truncated exponential backoff requires explicit configuration.
Testing: Only mock-based unit tests so far.  Haven't been able to get it ( or the background jwks fetch option, for that matter )  built in [esp-v2](https://github.com/GoogleCloudPlatform/espv-2) so far.
Docs Changes:
Release Notes:
Platform Specific Features:
Fixes envoyproxy#16319

Signed-off-by: Anthony Lichnewsky <alichnewsky@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

jwt_authn: RemoteJwks to support RetryPolicy config

4 participants