Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PermissionDenied error causes traffic loss #270

Closed
rpiceage opened this issue Oct 18, 2021 · 13 comments
Closed

PermissionDenied error causes traffic loss #270

rpiceage opened this issue Oct 18, 2021 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@rpiceage
Copy link

Hi,

During a 24 hour stability run we found that 2 NSC pods out of 4 had traffic loss.
In the stability test, we run basically the kernel2kernel example with 4 NSC-s.

In NSC logs the following errors can be found like this:
[ERRO] �[0m(15.1) rpc error: code = Unknown desc = Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Unknown desc = Error returned from api/pkg/api/networkservice/networkServiceClient.Request: rpc error: code = Unknown desc = Error returned from sdk/pkg/networkservice/common/authorize/authorizeServer.Request: rpc error: code = PermissionDenied desc = no sufficient privileges

In most of the cases these errors caused minimal traffic disturbances which were resolved in a few seconds. However after the error at 21:07:31.802 in the attached logs, traffic continued to be dropped, and did not recover in the next 7 minutes (after that our ctraffic tool exited in that pod because of too many connections lost).
I could not find any related error messages in the forwarder (the nsmgr logs unfortunately rolled out).

endpoint-nsc-d6c848c87-lhj4k.txt

@denis-tingaikin
Copy link
Member

Hello!

Could you share deployments versions?

@rpiceage
Copy link
Author

Sorry, I forgot.

nsmgr=d9c0b2c
registry-memory=f63cca1
forwarder-vpp=85e5a4f
nse-icmp-responder-vpp=a60ed79
nsc-vpp=fd917e3

@edwarnicke
Copy link
Member

@denis-tingaikin Any thoughts on this?

@edwarnicke
Copy link
Member

@rpiceage Could we retest this as soon as P2MP finishes landing?

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Nov 10, 2021
@denis-tingaikin
Copy link
Member

@NikitaSkrynnik Could you have a look when you will able?

@NikitaSkrynnik
Copy link
Contributor

@rpiceage Hello! Could you provide more logs from other components?

@rpiceage
Copy link
Author

Hi, unfortunately the logs have rotated out by the end of the test, and it's hard to reproduce, as the error occurred after quite a few hours. I'll do what I can.

@edwarnicke
Copy link
Member

@rpiceage Is it possible to retain more logs given that its a long term soak test?

@NikitaSkrynnik
Copy link
Contributor

@rpiceage Hello. From logs i can see that server rejects client request because client has old certificate. We discussed it and it looks like this Pull Request should fix the problem.

@rpiceage
Copy link
Author

Thanks, will try to re-run as soon as possible, with periodic log collection.

@edwarnicke
Copy link
Member

@rpiceage Is this still an issue?

@rpiceage
Copy link
Author

We haven't seen the issue in the last run. I think we can close this. Thanks for the effort.

@denis-tingaikin
Copy link
Member

@rpiceage Be free to reopen if you will face this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

4 participants