-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8581. Avoid random ports in integration tests #4699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LGTM. I have one concern, but we can monitor and see if this works well.
As we increment through the random ports, there is a chance something else on the host is using the port we try to use, eg some client bound to localhost writing into the cluster, for example. That would then cause the service to fail. I don't think we can completely prevent that, as something could slip in after you have allocated, but it might be possible to check the port is free as we allocate it, and if it is not, then increment the port number and try again. I think I saw some code in the past that did something like this - perhaps it tries to bind the port and then releases it before returning the port number.
I am happy to commit this change and then we can see if something like this occurs before adding such a check.
|
Limiting the port range to under 32k should work ok. We would need a lot of clusters to go from 15k to the max limit! Change LGTM, so please commit when CI is green. |
* master: (78 commits) HDDS-8575. Intermittent failure in TestCloseContainerEventHandler.testCloseContainerWithDelayByLeaseManager (apache#4688) HDDS-7241. EC: Reconstruction could fail with orphan blocks. (apache#4718) HDDS-8577. [Snapshot] Disable compaction log when loading metadata for snapshot (apache#4697) HDDS-7080. EC: Offline reconstruction needs better logging (apache#4719) HDDS-8626. Config thread pool in ReplicationServer (apache#4715) HDDS-8616. Underreplication not fixed if all replicas start decommissioning (apache#4711) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8254. Close containers when volume reaches utilisation threshold (apache#4583) HDDS-8615. Explicitly show EC block type in 'ozone debug chunkinfo' command output (apache#4706) HDDS-8623. Delete duplicate getBucketInfo in OMKeyCommitRequest (apache#4712) HDDS-8339. Recon Show the number of keys marked for Deletion in Recon UI. (apache#4519) HDDS-8572. Support CodecBuffer for protobuf v3 codecs. (apache#4693) HDDS-8010. Improve DN warning message when getBlock does not find the block. (apache#4698) HDDS-8621. IOException is never thrown in SCMRatisServer.getRatisRoles(). (apache#4710) HDDS-8463. S3 key uniqueness in deletedTable (apache#4660) HDDS-8584. Hadoop client write slowly when stream enabled (apache#4703) HDDS-7732. EC: Verify block deletion from missing EC containers (apache#4705) HDDS-8581. Avoid random ports in integration tests (apache#4699) HDDS-8504. ReplicationManager: Pass used and excluded node separately for Under and Mis-Replication (apache#4694) HDDS-8576. Close RocksDB instance in RDBStore if RDBStore's initialization fails after RocksDB instance creation (apache#4692) ...
What changes were proposed in this pull request?
TestDecommissionAndMaintenanceusesMiniOzoneClusterProviderto provision clusters in the background. Tests intermittently fail due to port conflict.The problem is that, while the datanode is stopped, its ports may be reused by some component in a new cluster being provisioned in the background. The original owner of the port fails to start, cluster never becomes ready again.
This PR replaces random ports with a simple incremental allocation starting at 15000. It applies to all
MiniOzoneCluster-based tests.https://issues.apache.org/jira/browse/HDDS-8581
How was this patch tested?
CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4945442087
100x run of
TestDecommissionAndMaintenance:https://github.com/adoroszlai/hadoop-ozone/actions/runs/4944968792