cloudtest is a test framework that allows testing Materialize inside Kubernetes.
Using a Kubernetes environment for testing has the advantage of exercising the same code paths used in production used to orchestrate cloud resources (e.g., clusters and secrets). Kubernetes will also be responsible for restarting any containers that have exited.
Notable deviations from production include:
- Using MinIO instead of S3 for
persist
blob storage. - Using a single-node CockroachDB installation instead of Cockroach Cloud.
- No access to AWS resources like VPC endpoints.
The framework is based on pytest and kind and uses, for the most part, the
official kubernetes
Python library to control the Kubernetes cluster.
-
Install kubectl, the official Kubernetes command-line tool:
On macOS, use Homebrew to install it:
brew install kubectl
On Linux, use:
curl -fL https://dl.k8s.io/release/v1.26.6/bin/linux/amd64/kubectl > kubectl chmod +x kubectl sudo mv kubectl /usr/local/bin
See the official kubectl installation instructions for additional installation options.
-
Install kind, which manages local Kubernetes clusters:
On macOS, use:
brew install kind
On Linux, use:
curl -fL https://kind.sigs.k8s.io/dl/v0.14.0/kind-linux-amd64 > kind chmod +x kind sudo mv kind /usr/local/bin
See the official kind installation instructions for additional installation options.
-
Create and configure a dedicated kind cluster for cloudtest:
cd test/cloudtest ./setup
-
On macOS, configure Docker to use "gRPC FUSE" as file sharing implementation for the containers (Docker settings, tab "General"). This will speed up the execution of cloudtests.
To run all short tests:
./pytest
To run a single test:
./pytest -k test_name_goes_here
--dev
flag:
./pytest --dev [-k TEST]
-m=long
flag:
./pytest -m=long
To check the cluster status:
kubectl --context=kind-mzcloud get all
Consider also using the k9s terminal user interface:
k9s --context=kind-mzcloud
To remove all resources from the Kubernetes cluster, so that a test can be rerun without needing to reset the cluster:
./reset
To remove the Kubernetes cluster entirely:
./teardown
cloudtest is also the recommended tool for deploying a local build of Materialize to Kubernetes, where you can connect to the cluster and interactively run tests by hand.
Use the test_wait
workflow, which does nothing but wait for the default
cluster to become ready:
./pytest --dev -k test_wait
See the examples in test/clustertest/test_smoke.py
.
The tests follow pytest conventions:
from materialize.cloudtest.app.materialize_application import MaterializeApplication
def test_something(mz: MaterializeApplication) -> None:
assert ...
The MaterializeApplication
object is what creates the Kubernetes cluster. It
is instantiated once per pytest
invocation
from materialize.cloudtest.util.wait import wait
wait(condition="condition=Ready", resource="pod/compute-cluster-u1-replica-u1-0")
wait
uses kubectl wait
behind the scenes. Here is what the kubectl wait
documentation has to say about the possible conditions:
# Wait for the pod "busybox1" to contain the status condition of type "Ready"
kubectl wait --for=condition=Ready pod/busybox1
# The default value of status condition is true; you can wait for other targets after an equal delimiter (compared
after Unicode simple case folding, which is a more general form of case-insensitivity):
kubectl wait --for=condition=Ready=false pod/busybox1
# Wait for the pod "busybox1" to contain the status phase to be "Running".
kubectl wait --for=jsonpath='{.status.phase}'=Running pod/busybox1
# Wait for the pod "busybox1" to be deleted, with a timeout of 60s, after having issued the "delete" command
kubectl delete pod/busybox1
kubectl wait --for=delete pod/busybox1 --timeout=60s
In particular, to wait until a resource has been deleted:
wait(condition="delete", resource="secret/some_secret")
mz.testdrive.run(
input=dedent(
"""
> SELECT 1;
1
"""
)
)
Note that each invocation of testdrive
will drop the current database and
recreate it. If you want to run multiple testdrive
fragments within the same
test, use no_reset=True
to prevent cleanup and seed=N
to make sure they all
share the same random seed:
mz.testdrive.run(..., no_reset=True, seed = N)
If no result set is expected:
mz.environmentd.sql("DROP TABLE t1;")
To fetch a result set:
id = mz.environmentd.sql_query("SELECT id FROM mz_secrets WHERE name = 'username'")[0][0]
You can call kubectl
and collect its output as follows:
secret_description = mz.kubectl("describe", "secret", "some_secret")
The following methods
mz.environmentd.api()
mz.environmentd.apps_api()
mz.environmentd.rbac_api()
return API handles that can then be used with the official kubernetes
Python
module.
If pods are failing with what seems like DNS issues (can't resolve redpanda, or
cannot connect to postgres) you can try and have a look at the relevant
Kubernetes
documentation.
At least the list of known
issues
can be very relevant for your linux distribution, if it is running
systemd-resolved
.
In at least one case, a VPN (mullvad) was interfering with DNS resolution. Try de-activating your VPN and then tear down and restart your testing cluster to see if that helps.
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied
If tests are failing almost immediately while trying to upload a file to S3, it may be a bug in our debuginfo upload logic. You can unset all your AWS credentials to work around this.
If ./setup
fails during the Joining worker nodes
step and spams 404 error messages, the kubelet has likely died on at least one node. You can troubleshoot this by adding --retain
to the kind create cluster
command in setup
, and then docker exec -it "$node" bash
to access the node. From there you can access the kubelet logs with journalctl -xeu kubelet
.
Some common issues are listed at https://kind.sigs.k8s.io/docs/user/known-issues . We launch many nodes, so it is likely to be the inotify limits.