Conversation
51ab414 to
52ed97f
Compare
dd32323 to
fc4b372
Compare
|
It seems that the Maybe we can remove it? |
|
DNS image is not used (as confirmed by @sopel39 sometime ago). |
|
@hashhar a gentle reminder that I'm waiting for your response in a few comments. |
|
Thanks for the reminder @nineinchnick. Replied and resolved some of them. |
ccb3406 to
58e3c78
Compare
| HEALTHCHECK --interval=10s --timeout=5s --start-period=10s \ | ||
| CMD curl -f http://localhost:10213/ |
There was a problem hiding this comment.
will this still show up as healthy when under heavy load on CI? e.g. longer timeout ?
There was a problem hiding this comment.
Same for other images in this commit " Add tests for spark images "
There was a problem hiding this comment.
The timeout is defined here as 5s. There's a retries parameter with a default value of 3: https://docs.docker.com/engine/reference/builder/#healthcheck. So the container will be marked as unhealthy after 3 failures (timeouts). But I'm not sure what you mean by "show up" - nothing except the tests here use these health checks.
There was a problem hiding this comment.
We use testcontainers for tests - does it remove containers when healthchecks fail? Or are they only tested during startup?
There was a problem hiding this comment.
Product tests use health checks for these images from this repo:
- hive3.1-hive
- hdp2.6-hive
- hdp3.1-hive
but they configure/override the healthcheck. Also, I'm not changing them in this PR. Other images, like spark3-delta, used in EnvSinglenodeDeltaLakeOss, don't wait for a health check, but for an open port: waitingFor(forSelectedPorts(SPARK_THRIFT_PORT)).
All other containers seem to have similar conditions.
% pwd
/Users/jwas/src/trino
% ag forHealthcheck --java --silent
plugin/trino-accumulo/src/test/java/io/trino/plugin/accumulo/TestingAccumuloServer.java
58: accumuloContainer.waitingFor(Wait.forHealthcheck().withStartupTimeout(Duration.ofMinutes(10)));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Hadoop.java
33:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
88: .waitingForAll(forSelectedPorts(10000), forHealthcheck()) // HiveServer2
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Standard.java
53:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
154: .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCompatibility.java
38:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
82: .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())and I checked all configs to see which hadoop base image they provide.
Then I checked:
% ag waitingFor testing/trino-product-tests-launcher
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/DockerContainer.java
323: public DockerContainer waitingForAll(WaitStrategy... strategies)
329: waitingFor(waitAllStrategy);
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/HydraIdentityProvider.java
60: .waitingFor(new SelectedPortWaitStrategy(5432));
72: .waitingFor(Wait.forHttp("/healthz").forPort(3000).forStatusCode(200));
93: .waitingFor(new WaitAllStrategy()
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Kerberos.java
58: .waitingFor(DEFAULT_WAIT_STRATEGY);
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Hadoop.java
88: .waitingForAll(forSelectedPorts(10000), forHealthcheck()) // HiveServer2
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Standard.java
134: .waitingFor(new WaitAllStrategy()) // don't wait
154: .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Minio.java
80: .waitingFor(forSelectedPorts(MINIO_PORT))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Kafka.java
79: .waitingFor(forSelectedPorts(2181))
97: .waitingForAll(forSelectedPorts(9092), forLogMessage(".*started \\(kafka.server.KafkaServer\\).*", 1))
113: .waitingFor(forSelectedPorts(SCHEMA_REGISTRY_PORT))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/AbstractEnvSinglenodeLdap.java
80: .waitingFor(forSelectedPorts(LDAP_PORT))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkHive.java
98: .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeClickhouse.java
100: .waitingFor(forSelectedPorts(2181))
121: .waitingFor(forSelectedPorts(httpPort, nativePort))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCassandra.java
70: .waitingFor(forSelectedPorts(CASSANDRA_PORT))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSqlserver.java
64: .waitingFor(forSelectedPorts(SQLSERVER_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIceberg.java
98: .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCompatibility.java
82: .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodePostgresql.java
70: .waitingFor(forSelectedPorts(POSTGRESQL_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodePhoenix5.java
61: .waitingFor(forSelectedPorts(ZOOKEEPER_PORT))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeKerberosKudu.java
73: .waitingForAll(
103: .waitingFor(forSelectedPorts(KUDU_MASTER_PORT));
123: .waitingFor(forSelectedPorts(initialKuduTserverPort))
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeMariadb.java
68: .waitingFor(forSelectedPorts(MARIADB_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeDeltaLakeOss.java
134: .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeMysql.java
67: .waitingFor(forSelectedPorts(MYSQL_PORT));That DEFAULT_WAIT_STRATEGY for Kerberos is also a port check:
public static final WaitStrategy DEFAULT_WAIT_STRATEGY = new WaitAllStrategy()
.withStrategy(forSelectedPorts(KERBEROS_PORT, KERBEROS_ADMIN_PORT))
.withStrategy(forLogMessage(".*krb5kdc entered RUNNING state.*", 1));There was a problem hiding this comment.
Indeed seems like Testcontainers only cares about the health/wait strategy when waiting for container to start - they aren't considered at all once containers start running.
Thanks for doing the deep investigation.
There was a problem hiding this comment.
Turns out product tests look at healthchecks after all - see failures in https://github.com/trinodb/trino/actions/runs/3342925565/jobs/5536149519 which fail to start environment because Spark healthcheck appears to fail.
There was a problem hiding this comment.
I thought that waitingFor() would override the health check:
DockerContainer container = new DockerContainer("ghcr.io/trinodb/testing/spark3-delta:" + hadoopImagesVersion, SPARK_CONTAINER_NAME)
.withCopyFileToContainer(forHostPath(configDir.getPath("spark-defaults.conf")), "/spark/conf/spark-defaults.conf")
.waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));
There was a problem hiding this comment.
Ok it does not remove the default health check but the failure is unexpected, I'm trying to reproduce this locally.
There was a problem hiding this comment.
The healthcheck is a bit weird.
It responds with:
{
"Start": "2022-10-28T09:09:23.693071205Z",
"End": "2022-10-28T09:09:23.75675015Z",
"ExitCode": 0,
"Output": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 44 0 44 0 0 8598 0 --:--:-- --:--:-- --:--:-- 8800\n\u0004\u0000\u0000\u0000\u0011Invalid status 71"
},
But locally the exit code is always 0, I'll retry on CI.
This is required to use the `--format` switch in `docker compose ps` and read the status of the health check in tests.
Currently used versions of Spark and Phoenix5 are not compatible with JDK 17 and should be using JDK 11.
Extracted from #143
Partially reverts #140 - restore using JDK 11 for Spark and Phoenix images. Tests were failing without this. This was also raised in trinodb/trino#13733