This repository was archived by the owner on Jan 28, 2022. It is now read-only.

Description
tldr; Continuing the load testing after #141, we encountered port exhaustion.
When running the load test we saw a sudden point where the metrics for calls to the databricks api switched from 100% successful to 100% failure:

This was running against a mock of the databricks API which also emits metrics and showed that it was no longer receiving any calls at the point where the failures were occuring in the operator:

Digging into the logs for the operator these errors starting being output at the time the failures occurred:
2019-12-19T18:52:21.336Z ERROR controller-runtime.controller Reconciler error {"controller": "run", "request": "default/run-8d3a46a0-ce07-42a3-a567-13596ef1f21b", "error": "error when handling finalizer: Get http://databricks-mock-api.databricks-mock-api:8080/api/2.0/jobs/runs/get?run_id=196: dial tcp 10.0.173.245:8080: connect: cannot assign requested address"}
Running a netstat from within the operator pod shows nearly 30,000 ESTABLISHED connections to the mock-api, leading to the port-exhaustion conclusion.