Skip to content

krb5kdc-fixture intermittently fails to start  #40624

@alpar-t

Description

@alpar-t

Ran into this with a PR run: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/11035/console

Waiting for TCP socket on 172.18.0.1:32769 of service 'hdfs_1' (Connection refused (Connection refused))
Waiting for TCP socket on 172.18.0.1:32769 of service 'hdfs_1' (Connection refused (Connection refused))
Starting process 'command 'docker''. Working directory: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/test/fixtures/krb5kdc-fixture Command: docker logs --follow=false 65e4f2dd95101e51f0ddba9a40d7448b17ce2ecb16659cccc7e88b6120c4265b
Successfully started process 'command 'docker''
Starting process 'command '/usr/local/bin/docker-compose''. Working directory: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/test/fixtures/krb5kdc-fixture Command: /usr/local/bin/docker-compose --no-ansi -f docker-compose.yml -p 78d469b805523ccd047df2ab0eb6ba41_krb5kdc-fixture_ stop --timeout 10
kadmin: GSS-API (or Kerberos) error while initializing kadmin interface

I found this SO: https://serverfault.com/questions/803662/kerberos-error-while-initializing-kadmin-interface-from-admin-server

Since we use ephemeral workers, the explanation seems plausible for us, the newly booted VMs have little entropy and can lead to this failure.

We should probably install Rng-tools on the CI workers to see if it helps.
The problem is not reproducible locally as one would expect as developers kernels will have lots of entropy.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions