-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8962. Ensure docker env is stopped #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adoroszlai for the improvement! Change mostly lgtm. I have one question inline.
| docker-compose --ansi never down | ||
| if ! { docker-compose --ansi never up -d --scale datanode="${datanode_count}" \ | ||
| && wait_for_safemode_exit \ | ||
| && wait_for_om_leader ; }; then | ||
| [[ -n "$OUTPUT_NAME" ]] || OUTPUT_NAME="$COMPOSE_ENV_NAME" | ||
| stop_docker_env | ||
| return 1 | ||
| fi | ||
|
|
||
| trap stop_docker_env EXIT HUP INT TERM | ||
|
|
||
| docker-compose --ansi never up -d --scale datanode="${datanode_count}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, with the changes in this PR:
copy_daemon_logsandstop_docker_envcalls are removed fromexecute_robot_test()copy_daemon_logsis now called bystop_docker_env()stop_docker_envis now called fromstart_docker_env()
Thus whoever calls start_docker_env last would have to call stop_docker_env in order to have the logs collected, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @smengcl for the review.
whoever calls
start_docker_envlast would have to callstop_docker_envin order to have the logs collected
It's like a ShutdownHook in Java. The shell calls stop_docker_env when the test script exits. start_docker_env just sets it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah makes sense now. That is exactly what trap is for. Thanks @adoroszlai .
|
Thanks @smengcl for the review. |
* master: (36 commits) HDDS-8990. Intermittent timeout waiting on datanode4 9856 to become available (apache#5039) Revert "HDDS-7750. Incorrect WRITE ACL check. (apache#4992)" HDDS-7750. Incorrect WRITE ACL check. (apache#4992) HDDS-8985. Intermittent timeout exiting safe mode in HA secure tests (apache#5033) HDDS-8593. Add RootCARotationPoller to CertClient (apache#5030) HDDS-7645. Kubernetes check should fail fast if cluster cannot start (apache#5028) HDDS-8981. TestRootedOzoneFileSystem runs out of disk space (apache#5029) HDDS-8592. Fetch and save all root certificates during service's certificate rotation. (apache#5025) HDDS-8981. Disable TestRootedOzoneFileSystem#testSafeMode HDDS-8591. Create scheduler to check for new root ca certificates (apache#4961) HDDS-8979. error validating kustomization.yaml (apache#5024) HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership (apache#5018) HDDS-8970. Snapshot Diff should return path relative to bucket root (apache#5015) HDDS-8975. Clarify SCM HA auto-bootstrap doc (apache#5021) HDDS-8689. Rotate Root CA and Sub CA in SCM. (apache#4943) HDDS-8436. Support setSafeMode(), isFileClosed() FileSystem API (apache#4825) HDDS-8880. Intermittent fork timeout in TestOMRatisSnapshots (apache#5022) HDDS-8962. Ensure docker env is stopped (apache#5011) HDDS-7794. [snapshot] SnapshotDiff should throw better error messages for exception handling (apache#5007) HDDS-7922. [FSO] S3G folder support fso layout filestatus s3A compatibility (apache#4448) ...
What changes were proposed in this pull request?
Currently, if acceptance test script exits abruptly due to failed command (e.g. timeout in
wait_for_port), the docker environment is left running, logs are not saved, subsequent test scripts may fail due to port conflict, etc. This is happening recently in HA-secure:This change adds a
trapto ensurestop_docker_envis executed.https://issues.apache.org/jira/browse/HDDS-8962
How was this patch tested?
Verified that logs are saved for
failures1, too:Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5424893835