Skip to content

Add tests for all images#147

Merged
hashhar merged 10 commits intotrinodb:masterfrom
nineinchnick:tests
Oct 12, 2022
Merged

Add tests for all images#147
hashhar merged 10 commits intotrinodb:masterfrom
nineinchnick:tests

Conversation

@nineinchnick
Copy link
Copy Markdown
Member

Extracted from #143

Partially reverts #140 - restore using JDK 11 for Spark and Phoenix images. Tests were failing without this. This was also raised in trinodb/trino#13733

@nineinchnick
Copy link
Copy Markdown
Member Author

nineinchnick commented Sep 9, 2022

It seems that the testing/dns image didn't work at all until I upgraded the base image. It has very few pulls reported on the package page: https://github.com/trinodb/docker-images/pkgs/container/testing%2Fdns and is not referenced in https://github.com/trinodb/trino

Maybe we can remove it?

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Sep 10, 2022

DNS image is not used (as confirmed by @sopel39 sometime ago).

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

@nineinchnick
Copy link
Copy Markdown
Member Author

@hashhar a gentle reminder that I'm waiting for your response in a few comments.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Oct 10, 2022

Thanks for the reminder @nineinchnick. Replied and resolved some of them.

@nineinchnick nineinchnick force-pushed the tests branch 2 times, most recently from ccb3406 to 58e3c78 Compare October 11, 2022 10:07
Comment on lines +47 to +48
HEALTHCHECK --interval=10s --timeout=5s --start-period=10s \
CMD curl -f http://localhost:10213/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still show up as healthy when under heavy load on CI? e.g. longer timeout ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for other images in this commit " Add tests for spark images "

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout is defined here as 5s. There's a retries parameter with a default value of 3: https://docs.docker.com/engine/reference/builder/#healthcheck. So the container will be marked as unhealthy after 3 failures (timeouts). But I'm not sure what you mean by "show up" - nothing except the tests here use these health checks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use testcontainers for tests - does it remove containers when healthchecks fail? Or are they only tested during startup?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Product tests use health checks for these images from this repo:

  • hive3.1-hive
  • hdp2.6-hive
  • hdp3.1-hive

but they configure/override the healthcheck. Also, I'm not changing them in this PR. Other images, like spark3-delta, used in EnvSinglenodeDeltaLakeOss, don't wait for a health check, but for an open port: waitingFor(forSelectedPorts(SPARK_THRIFT_PORT)).
All other containers seem to have similar conditions.

% pwd
/Users/jwas/src/trino
% ag forHealthcheck --java --silent
plugin/trino-accumulo/src/test/java/io/trino/plugin/accumulo/TestingAccumuloServer.java
58:        accumuloContainer.waitingFor(Wait.forHealthcheck().withStartupTimeout(Duration.ofMinutes(10)));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Hadoop.java
33:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
88:                .waitingForAll(forSelectedPorts(10000), forHealthcheck()) // HiveServer2

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Standard.java
53:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
154:                .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCompatibility.java
38:import static org.testcontainers.containers.wait.strategy.Wait.forHealthcheck;
82:                .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())

and I checked all configs to see which hadoop base image they provide.

Then I checked:

% ag waitingFor testing/trino-product-tests-launcher 
testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/DockerContainer.java
323:    public DockerContainer waitingForAll(WaitStrategy... strategies)
329:        waitingFor(waitAllStrategy);

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/HydraIdentityProvider.java
60:                .waitingFor(new SelectedPortWaitStrategy(5432));
72:                .waitingFor(Wait.forHttp("/healthz").forPort(3000).forStatusCode(200));
93:                .waitingFor(new WaitAllStrategy()

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Kerberos.java
58:                .waitingFor(DEFAULT_WAIT_STRATEGY);

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Hadoop.java
88:                .waitingForAll(forSelectedPorts(10000), forHealthcheck()) // HiveServer2

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Standard.java
134:                .waitingFor(new WaitAllStrategy()) // don't wait
154:                .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Minio.java
80:                .waitingFor(forSelectedPorts(MINIO_PORT))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/common/Kafka.java
79:                .waitingFor(forSelectedPorts(2181))
97:                .waitingForAll(forSelectedPorts(9092), forLogMessage(".*started \\(kafka.server.KafkaServer\\).*", 1))
113:                .waitingFor(forSelectedPorts(SCHEMA_REGISTRY_PORT))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/AbstractEnvSinglenodeLdap.java
80:                .waitingFor(forSelectedPorts(LDAP_PORT))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkHive.java
98:                .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeClickhouse.java
100:                .waitingFor(forSelectedPorts(2181))
121:                .waitingFor(forSelectedPorts(httpPort, nativePort))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCassandra.java
70:                .waitingFor(forSelectedPorts(CASSANDRA_PORT))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSqlserver.java
64:                .waitingFor(forSelectedPorts(SQLSERVER_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIceberg.java
98:                .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeCompatibility.java
82:                .waitingForAll(forLogMessage(".*======== SERVER STARTED ========.*", 1), forHealthcheck())

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodePostgresql.java
70:                .waitingFor(forSelectedPorts(POSTGRESQL_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodePhoenix5.java
61:                .waitingFor(forSelectedPorts(ZOOKEEPER_PORT))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeKerberosKudu.java
73:                .waitingForAll(
103:                .waitingFor(forSelectedPorts(KUDU_MASTER_PORT));
123:                    .waitingFor(forSelectedPorts(initialKuduTserverPort))

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeMariadb.java
68:                .waitingFor(forSelectedPorts(MARIADB_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeDeltaLakeOss.java
134:                .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));

testing/trino-product-tests-launcher/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeMysql.java
67:                .waitingFor(forSelectedPorts(MYSQL_PORT));

That DEFAULT_WAIT_STRATEGY for Kerberos is also a port check:

    public static final WaitStrategy DEFAULT_WAIT_STRATEGY = new WaitAllStrategy()
            .withStrategy(forSelectedPorts(KERBEROS_PORT, KERBEROS_ADMIN_PORT))
            .withStrategy(forLogMessage(".*krb5kdc entered RUNNING state.*", 1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed seems like Testcontainers only cares about the health/wait strategy when waiting for container to start - they aren't considered at all once containers start running.

Thanks for doing the deep investigation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out product tests look at healthchecks after all - see failures in https://github.com/trinodb/trino/actions/runs/3342925565/jobs/5536149519 which fail to start environment because Spark healthcheck appears to fail.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that waitingFor() would override the health check:

        DockerContainer container = new DockerContainer("ghcr.io/trinodb/testing/spark3-delta:" + hadoopImagesVersion, SPARK_CONTAINER_NAME)
                .withCopyFileToContainer(forHostPath(configDir.getPath("spark-defaults.conf")), "/spark/conf/spark-defaults.conf")
                .waitingFor(forSelectedPorts(SPARK_THRIFT_PORT));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it does not remove the default health check but the failure is unexpected, I'm trying to reproduce this locally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The healthcheck is a bit weird.

It responds with:

{
        "Start": "2022-10-28T09:09:23.693071205Z",
        "End": "2022-10-28T09:09:23.75675015Z",
        "ExitCode": 0,
        "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100    44    0    44    0     0   8598      0 --:--:-- --:--:-- --:--:--  8800\n\u0004\u0000\u0000\u0000\u0011Invalid status 71"
      },

But locally the exit code is always 0, I'll retry on CI.

Jan Waś added 8 commits October 11, 2022 20:55
This is required to use the `--format` switch in `docker compose ps`
and read the status of the health check in tests.
Currently used versions of Spark and Phoenix5 are not compatible with
JDK 17 and should be using JDK 11.
@hashhar hashhar merged commit fbe9270 into trinodb:master Oct 12, 2022
@nineinchnick nineinchnick deleted the tests branch October 27, 2022 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants