-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Hey! We've got some strange issue with ssl initializing after upgrading to .net 6.0 from .net 5.0.
We have multiple microservices deployed on our own Kubernetes. There are 4 physical nodes on Debian Buster, each of node has several virtual machines as Kubernetes nodes. The microservices images are based on mcr.microsoft.com/dotnet/aspnet:6.0 with some debugging/servicing appendings like apt-get install curl and so on. We've also tried :6.0-focal and it was like it helped but I'll describe it below.
The microservices mainly exchanges data with each other by plain http through their internal hostnames, but for some specific reason sometimes they are connecting to each other through global network by https (in general it may be some external data provider which is not under our control).
The thing is sometimes when the https connection is starting it fails to initialize the openssl library with some strange errors - error:04067084:rsa routines:rsa_ossl_public_decrypt:data too large for modulus and (or) error:04067072:rsa routines:rsa_ossl_public_decrypt:padding check failed. As far as I know, there is no calls of routines:rsa_ossl_public_decrypt:data in openssl initialization. It fails randomly on different pods, different nodes etc. If one connection on one pod didn't fail, it won't fail ever. If one connection on one pod failed - it'll be failing again and again until pod restart, and then it repeats, meaning it can fail or not again. Is seems like there is some threading issue here, but it is just by suggestion. I've tried to figure out the ssl initialization process by reading the sources of System.Net.Security and Crypto.Native wrapper but no luck so far.
I still cannot reproduce this on some simple application, environment or test. It is not reproducing on windows and on local wsl2 docker - only in our Kubernetes environment so far (we have some other environments, but that one mentioned is our dev and we cannot upgrade others while dev is failing).
The other thing I've found that it happens only if we have /etc/ssl/openssl.cnf with some non-empty system default settings (system_default_sect, like described in #46271). If we leave system_default_sect empty or just provide no config file at all (that is the case of :6.0-focal, it has no openssl config file by default) - everything works fine. If we put there just the ciphers that are already default as described here (https://docs.microsoft.com/en-us/dotnet/core/compatibility/cryptography/5.0/default-cipher-suites-for-tls-on-linux) - it starts failing. And we need this config defaults (just like in :6.0 image) because we still need to connect to the sites with some obsolete weak ciphers.
It worked fine on .net 5.0.
Could you please suggest something or direct me to the way I can investigate it further?
Two exception stack examples attached.
modulus.txt
padding.txt