Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[forge] delete orphaned test runners #15557

Merged
merged 5 commits into from
Dec 12, 2024
Merged

Conversation

rustielin
Copy link
Contributor

@rustielin rustielin commented Dec 10, 2024

Description

Delete orphaned test runners. Orphaned test runners:

  • have no corresponding forge namespace with a running testnet
  • are older than a threshold time (300s)

How Has This Been Tested?

$ cargo run -p aptos-forge-cli -- operator clean-up --dry-run
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on build directory
   Compiling aptos-forge v0.0.0 (/Users/rustielin/Code/aptos-core/testsuite/forge)
   Compiling aptos-testcases v0.0.0 (/Users/rustielin/Code/aptos-core/testsuite/testcases)
   Compiling aptos-forge-cli v0.0.0 (/Users/rustielin/Code/aptos-core/testsuite/forge-cli)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 28.63s
     Running `target/debug/forge operator clean-up --dry-run`
2024-12-10T21:33:56.004559Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:983 Dry run mode, skipping actual cleanup
2024-12-10T21:33:57.422285Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-compat-pr-15115 with data: {"cleanup": "1733867012", "keep": "false"}
2024-12-10T21:33:57.422342Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-compat-pr-15115 has remaining 576 seconds before cleanup
2024-12-10T21:33:57.422422Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-compat-pr-15348 with data: {"cleanup": "1733867532", "keep": "false"}
2024-12-10T21:33:57.422432Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-compat-pr-15348 has remaining 1096 seconds before cleanup
2024-12-10T21:33:57.422518Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-compat-pr-15411 with data: {"cleanup": "1733866472", "keep": "false"}
2024-12-10T21:33:57.422526Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-compat-pr-15411 has remaining 36 seconds before cleanup
2024-12-10T21:33:57.422598Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-compat-pr-15554 with data: {"cleanup": "1733866626", "keep": "false"}
2024-12-10T21:33:57.422604Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-compat-pr-15554 has remaining 190 seconds before cleanup
2024-12-10T21:33:57.422687Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-compat-pr-15556 with data: {"cleanup": "1733867880", "keep": "false"}
2024-12-10T21:33:57.422696Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-compat-pr-15556 has remaining 1444 seconds before cleanup
2024-12-10T21:33:57.422797Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15348 with data: {"cleanup": "1733867717", "keep": "false"}
2024-12-10T21:33:57.422805Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15348 has remaining 1281 seconds before cleanup
2024-12-10T21:33:57.423168Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15493 with data: {"cleanup": "1733867014", "keep": "false"}
2024-12-10T21:33:57.423191Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15493 has remaining 578 seconds before cleanup
2024-12-10T21:33:57.423312Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15498 with data: {"cleanup": "1733867137", "keep": "false"}
2024-12-10T21:33:57.423321Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15498 has remaining 701 seconds before cleanup
2024-12-10T21:33:57.423585Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15535 with data: {"cleanup": "1733866944", "keep": "false"}
2024-12-10T21:33:57.423598Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15535 has remaining 508 seconds before cleanup
2024-12-10T21:33:57.423717Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15554 with data: {"cleanup": "1733866806", "keep": "false"}
2024-12-10T21:33:57.423726Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15554 has remaining 370 seconds before cleanup
2024-12-10T21:33:57.423834Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-e2e-pr-15556 with data: {"cleanup": "1733868059", "keep": "false"}
2024-12-10T21:33:57.423843Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-e2e-pr-15556 has remaining 1623 seconds before cleanup
2024-12-10T21:33:57.423936Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-framework-upgrade-pr-15348 with data: {"cleanup": "1733870832", "keep": "false"}
2024-12-10T21:33:57.423946Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-framework-upgrade-pr-15348 has remaining 4396 seconds before cleanup
2024-12-10T21:33:57.424023Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1009 Got configmap forge-management-forge-framework-upgrade-pr-15556 with data: {"cleanup": "1733871183", "keep": "false"}
2024-12-10T21:33:57.424030Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1107 Namespace forge-framework-upgrade-pr-15556 has remaining 4747 seconds before cleanup
2024-12-10T21:33:57.551436Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-compat-pr-15115
2024-12-10T21:33:57.551507Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-compat-pr-15348
2024-12-10T21:33:57.551524Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-compat-pr-15411
2024-12-10T21:33:57.551541Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-compat-pr-15554
2024-12-10T21:33:57.551557Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-compat-pr-15556
2024-12-10T21:33:57.551576Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15348
2024-12-10T21:33:57.551592Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15493
2024-12-10T21:33:57.551608Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15498
2024-12-10T21:33:57.551624Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15535
2024-12-10T21:33:57.551640Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15554
2024-12-10T21:33:57.551656Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-e2e-pr-15556
2024-12-10T21:33:57.551674Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-framework-upgrade-pr-15348
2024-12-10T21:33:57.551692Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-framework-upgrade-pr-15556
2024-12-10T21:33:57.551810Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1039 Found forge namespace: forge-pfn-const-tps-with-realistic-env-e7eff822644a401b5ae78095
2024-12-10T21:33:57.682589Z [main] INFO testsuite/forge/src/backend/k8s/cluster_helper.rs:1063 Deleting orphaned pod forge-e2e-pr-15115-1733865496-d30a81f07c74c2f9d55d24dff0e4e390e with age 939 without any corresponding namespace

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Dec 10, 2024

⏱️ 2h 46m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
cli-e2e-tests / run-cli-tests 15m 🟩
rust-cargo-deny 14m 🟩🟩🟩🟩🟩 (+3 more)
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 12m 🟩
rust-move-tests 12m 🟩
check-dynamic-deps 11m 🟩🟩🟩🟩🟩 (+3 more)
execution-performance / test-target-determinator 10m 🟩🟩
test-target-determinator 9m 🟩🟩
rust-move-tests 9m
check 7m 🟩🟩
rust-move-tests 7m
rust-doc-tests 6m 🟩
rust-doc-tests 5m 🟩
rust-move-tests 5m

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 10s 20m -99%

settingsfeedbackdocs ⋅ learn more about trunk.io

@rustielin rustielin force-pushed the rustielin/forge-force-delete branch from 3fc16e5 to c879811 Compare December 10, 2024 21:34
Comment on lines 1056 to 1058
!forge_namespaces
.iter()
.any(|namespace| pod_name.contains(namespace.metadata.name.as_ref().unwrap()))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current string matching with contains() is susceptible to false matches when namespace names overlap. For example, if there are two namespaces forge-test1 and forge-test10, a pod named forge-test1-xyz would match both namespaces. Consider using a more precise matching pattern such as starts_with() or a regex that enforces namespace boundaries in the pod name.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@rustielin rustielin marked this pull request as ready for review December 10, 2024 23:32
@rustielin rustielin requested review from a team and sionescu December 10, 2024 23:33
@rustielin rustielin force-pushed the rustielin/forge-force-delete branch from 30fb09d to 91b76e9 Compare December 12, 2024 19:18
@rustielin rustielin requested review from a team December 12, 2024 19:44
@rustielin rustielin enabled auto-merge (squash) December 12, 2024 20:00

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9 (PR)
1. Check liveness of validators at old version: 3c6e693a27339e73520f41030dce8fc9cd504967
compatibility::simple-validator-upgrade::liveness-check : committed: 17013.02 txn/s, latency: 1964.57 ms, (p50: 2100 ms, p70: 2100, p90: 2200 ms, p99: 2300 ms), latency samples: 556640
2. Upgrading first Validator to new version: 79c1cfb6af25db6a997b11b102ef460bf5ff69a9
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7344.59 txn/s, latency: 3903.30 ms, (p50: 4300 ms, p70: 4600, p90: 4700 ms, p99: 4800 ms), latency samples: 138220
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7535.01 txn/s, latency: 4340.46 ms, (p50: 4700 ms, p70: 4700, p90: 4800 ms, p99: 5000 ms), latency samples: 252900
3. Upgrading rest of first batch to new version: 79c1cfb6af25db6a997b11b102ef460bf5ff69a9
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6835.08 txn/s, latency: 4184.20 ms, (p50: 4800 ms, p70: 5100, p90: 5300 ms, p99: 5400 ms), latency samples: 124720
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7255.22 txn/s, latency: 4524.25 ms, (p50: 4900 ms, p70: 5000, p90: 5100 ms, p99: 5400 ms), latency samples: 240520
4. upgrading second batch to new version: 79c1cfb6af25db6a997b11b102ef460bf5ff69a9
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10921.07 txn/s, latency: 2567.75 ms, (p50: 2600 ms, p70: 3100, p90: 3600 ms, p99: 3900 ms), latency samples: 188760
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11238.23 txn/s, latency: 2854.18 ms, (p50: 2700 ms, p70: 3400, p90: 3600 ms, p99: 4200 ms), latency samples: 368880
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9 passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 79c1cfb6af25db6a997b11b102ef460bf5ff69a9

two traffics test: inner traffic : committed: 14861.92 txn/s, latency: 2670.65 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3200 ms), latency samples: 5651060
two traffics test : committed: 100.08 txn/s, latency: 1450.51 ms, (p50: 1400 ms, p70: 1500, p90: 1500 ms, p99: 1700 ms), latency samples: 1740
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.536, avg: 1.449", "ConsensusProposalToOrdered: max: 0.313, avg: 0.289", "ConsensusOrderedToCommit: max: 0.381, avg: 0.371", "ConsensusProposalToCommit: max: 0.667, avg: 0.660"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.98s no progress at version 16916 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.66s no progress at version 1869675 (avg 0.58s) [limit 16].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9

Compatibility test results for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9 (PR)
Upgrade the nodes to version: 79c1cfb6af25db6a997b11b102ef460bf5ff69a9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1041.77 txn/s, submitted: 1044.30 txn/s, failed submission: 2.53 txn/s, expired: 2.53 txn/s, latency: 3002.46 ms, (p50: 2300 ms, p70: 2700, p90: 6000 ms, p99: 8400 ms), latency samples: 90620
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1426.22 txn/s, submitted: 1430.19 txn/s, failed submission: 3.97 txn/s, expired: 3.97 txn/s, latency: 2089.16 ms, (p50: 2100 ms, p70: 2400, p90: 2700 ms, p99: 3900 ms), latency samples: 129200
5. check swarm health
Compatibility test for 3c6e693a27339e73520f41030dce8fc9cd504967 ==> 79c1cfb6af25db6a997b11b102ef460bf5ff69a9 passed
Upgrade the remaining nodes to version: 79c1cfb6af25db6a997b11b102ef460bf5ff69a9
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1371.31 txn/s, submitted: 1374.07 txn/s, failed submission: 2.76 txn/s, expired: 2.76 txn/s, latency: 2246.55 ms, (p50: 2100 ms, p70: 2400, p90: 3200 ms, p99: 4500 ms), latency samples: 119360
Test Ok

@rustielin rustielin merged commit 49d6406 into main Dec 12, 2024
43 of 46 checks passed
@rustielin rustielin deleted the rustielin/forge-force-delete branch December 12, 2024 21:23
georgemitenkov pushed a commit that referenced this pull request Jan 6, 2025
* [forge] delete orphaned test runners

* [forge] cleanup add dry run mode

* [forge] cleanup names by substr robust

* [forge] cleanup names by substr robust 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants