-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication breaks in chart 4.2.1 w/openldap 2.6.6 - Error, ldap_start_tls failed #148
Comments
Additionally, upon testing with the previous chart version, the replication works. Confirms that the issue is with the 4.2.1 chart.
Here are the full overrides of the yaml files
The only difference between the yaml is the openldap version ( 2.6.3 vs 2.6.6 ). |
Hi @zerowebcorp |
No, replication is still not working. |
Which |
@jp-gouin In case it's related I'm not seeing the change from jp-gouin/containers@3222981 in jpgouin/openldap:2.6.6-fix which 4.2.2 uses. I also don't see it in the bitnami image even though it was seemingly merged in. |
@parak Indeed it looks like it's not related, currently That is why I reverted the image to |
I run with ver 4.2.2 of the chart and image jpgouin/openldap:2.6.6-fix - it works fine. The TLS error code in the logs above evolves from:
Do you by any chance use (the default) in chart values:
and let the init container create the TLS certs ? That configuration is only suitable for single node, multi-nodes need the same CA to establish TLS trust... Meaning you should create CA + TLS key + TLS cert and store those in a secret for all the nodes to use. |
I am seeing the same issue regardless of the tls_enabled setting with replication and regardless of the image used. Its a fresh first time install and certs were generated using https://www.openldap.org/faq/data/cache/185.html |
Replication fails to work with the following config for me. If I search the respective replicas for members of a group it will not show any for the second and third instance, but it shows it for the first. This is off a fresh install. Values are below.
|
I'm also observing (with chart version 4.2.2 and default image initTLSSecret:
#
tls_enabled: true
# The name of a kubernetes.io/tls type secret to use for TLS
secret: "openldap-tls" The configured certicate seem valid to me: the ldaps:// client connections are properly accepted. The certificate properly include the FQDN used by the replication leveraging the headless service, and are properly validated by an I see no improvement when modifying
Increasing log levels show some additional errors
Looking at the ldap replication doc at https://www.zytrax.com/books/ldap/ch6/#syncrepl for other workarounds, I could only spot the possibility to specify an explicit Any other ideas for diagnostics or fix/workaround ? |
Surprisingly:
This might just be polluting traces that should be ignored ?! I tried to bump to chart [email protected] (which still uses image Besides, there seem to have polluting traces in the output due to the tcp probes configured in the helm which connects to the ldap daemon without sending payload.
I guess this could be avoided by using a probe command (using ldap client) instead of using a tcp probe at helm-openldap/templates/statefulset.yaml Lines 211 to 213 in 17694f4
details about observing established connections among podsAttaching a debug container to the pod to get the netstat command, I can properly see the established connections among the pods
Double checking the current ci w.r.t. certs where the ca cert is generated at helm-openldap/.github/workflows/ci-ha.yml Lines 18 to 20 in 850ca5b
The difference with my setting is that the custom ca.cert is distinct from the server certificate, however the following commands properly validate tls certs. I also mounted the ca.cert into /etc/ssl/certs/ using a custom volume
Which is in sync with the olcSyncrepl FQDN used
@jp-gouin Are you aware of setups where a custom self_signed CA (distinct from the server cert) is used and not such error replication logs are observed ? |
thanks for the probe hint , I’ll make sure to fix that in the upcoming update. regarding your replication issue, to me as long as you have But yes I can also see some « pollution » in my logs which does not affect the replication. maybe by properly handling the cert for all replicas using the proper SAN the pollution might disappear but that might not be an easy task and probably painful for users that wants to use their own certs |
Thanks @jp-gouin for your prompt response !
This is good to hear. Would you mind sharing some extracts to confirm they match what I as reporting ?
Reading the documentation below, I'm concerned that setting Did you ever consider supporting an option in the chart to allow use of an explicit ldaps:// protocol to the ldap-tls 1636 port in the replication url instead of relying on the ldap port 1389 with the tls_reqcert spec detailshttps://www.zytrax.com/books/ldap/ch6/#syncrepl
https://www.zytrax.com/books/ldap/ch6/ldap-conf.html#tls-reqcert
On my setting, despites the SAN including all replicas as illustrated below, the pollution log is still there. Can you think of missing SANs I should try to add ?
|
Indeed I agree with you that this is not man in the middle proof , might be worth trying again ... If you want to try and submit a PR that would be greatly appreciated 😀 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
des news ? |
i'm also experiencing replication issues, but TLS I noticed that when deploying a fresh cluster, openldap-0 pod inits fine, no crashes, but all other pods e.g. openldap-1 will crash with the following error during init:
It seems that this blocks the replica from properly initializing and any writes into this replica will not be sync'd into the other replicas. EDIT: Actually it seems that the issue is indeed related to TLS. The issue may be caused by the crash previously mentioned (not clear). It seems that openldap-0 (the first pod being initialized) has the path for CA configured:
if we have a look at any other pod, nothing:
So these pods have no idea where to fetch the CA -> errors. This setting is indeed set in the initialization script: https://github.com/bitnami/containers/blob/deb6cea75770638735e164915b4bfd6add27860e/bitnami/openldap/2.6/debian-12/rootfs/opt/bitnami/scripts/libopenldap.sh#L735 So I think this chart or the docker images used need some patching to avoid containers from crashing in the init scripts... Mitigation in the chart, edit the command for the openldap container:
|
… try to init certain stuff - more detail: jp-gouin#148 (comment)
Hello,
I did try this chart a weeks/a month ago on azure aks and didn't observe this issue, and trying it this week on a new bare metal k8s cluster gives me this error. I noticed that the chart has been upgraded to new versions and a lot has been changed.
when deploying a new openldap gives me this error
Steps to replicate:
I used version 4.2.1 which uses openldap 2.6.6 current version.
The following values are the user-supplied values.
This created 2 openldap pods, I logged into each pod and verified that the changes are not replicating. The error log shows the error shown above.
Logs from pod-0
The text was updated successfully, but these errors were encountered: