Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more logging and fix the table info tar file creation. #14857

Conversation

larry-aptos
Copy link
Contributor

@larry-aptos larry-aptos commented Oct 3, 2024

Description

  • Adds more logging for debuggability
  • Fix the tar file creation issue:
    • Before: compression streams into tmp tar file under the same folder
    • After: compression is done in memory and file creation in one shot.
    • This fixed the problem that if tar tmp file exists, it'll create a circle: it tries to compress the tar file itself...

How Has This Been Tested?

deployed and looks good

image

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Oct 3, 2024

⏱️ 1h total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 23m 🟩
rust-cargo-deny 5m 🟩🟩🟩
test-target-determinator 5m 🟩
rust-doc-tests 5m 🟩
execution-performance / test-target-determinator 5m 🟩
check 4m 🟩
check-dynamic-deps 3m 🟩🟩🟩
rust-move-tests 2m 🟩
fetch-last-released-docker-image-tag 2m 🟩
rust-move-tests 2m 🟩
general-lints 2m 🟩🟩🟩
rust-move-tests 2m 🟩
semgrep/ci 1m 🟩🟩🟩
file_change_determinator 31s 🟩🟩🟩
file_change_determinator 17s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / single-node-performance 23m 16m +42%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @larry-aptos and the rest of your teammates on Graphite Graphite

@larry-aptos larry-aptos marked this pull request as ready for review October 3, 2024 05:37
@larry-aptos larry-aptos force-pushed the 10-02-add_more_logging_and_fix_the_table_info_tar_file_creation branch from a5128e0 to 918e487 Compare October 3, 2024 05:44
@larry-aptos larry-aptos requested a review from jillxuu October 3, 2024 22:19
@larry-aptos larry-aptos enabled auto-merge (squash) October 9, 2024 19:03

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite compat success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23 (PR)
1. Check liveness of validators at old version: 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775
compatibility::simple-validator-upgrade::liveness-check : committed: 12632.36 txn/s, latency: 2507.01 ms, (p50: 1900 ms, p70: 2200, p90: 4600 ms, p99: 17400 ms), latency samples: 445260
2. Upgrading first Validator to new version: 918e4873a3b767bee5054b512671eeaab6b27f23
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6904.43 txn/s, latency: 4091.23 ms, (p50: 4800 ms, p70: 5000, p90: 5100 ms, p99: 5200 ms), latency samples: 126500
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6310.09 txn/s, latency: 5000.45 ms, (p50: 5100 ms, p70: 5200, p90: 7100 ms, p99: 7400 ms), latency samples: 240540
3. Upgrading rest of first batch to new version: 918e4873a3b767bee5054b512671eeaab6b27f23
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 5801.36 txn/s, latency: 4928.48 ms, (p50: 5600 ms, p70: 6000, p90: 6100 ms, p99: 6200 ms), latency samples: 111140
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 5322.48 txn/s, latency: 6148.81 ms, (p50: 6500 ms, p70: 6700, p90: 7600 ms, p99: 7800 ms), latency samples: 189740
4. upgrading second batch to new version: 918e4873a3b767bee5054b512671eeaab6b27f23
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11497.05 txn/s, latency: 2361.57 ms, (p50: 2600 ms, p70: 2700, p90: 2900 ms, p99: 2900 ms), latency samples: 203220
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 11596.41 txn/s, latency: 2689.44 ms, (p50: 2700 ms, p70: 2800, p90: 3000 ms, p99: 3800 ms), latency samples: 373700
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23 passed
Test Ok

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite realistic_env_max_load success on 918e4873a3b767bee5054b512671eeaab6b27f23

two traffics test: inner traffic : committed: 13658.00 txn/s, latency: 2905.90 ms, (p50: 2700 ms, p70: 3000, p90: 3100 ms, p99: 4400 ms), latency samples: 5193120
two traffics test : committed: 100.05 txn/s, latency: 2641.68 ms, (p50: 2500 ms, p70: 2600, p90: 2900 ms, p99: 10800 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.237, avg: 0.219", "QsPosToProposal: max: 0.318, avg: 0.252", "ConsensusProposalToOrdered: max: 0.327, avg: 0.298", "ConsensusOrderedToCommit: max: 0.506, avg: 0.473", "ConsensusProposalToCommit: max: 0.801, avg: 0.771"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 1.30s no progress at version 29198 (avg 0.21s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.83s no progress at version 1859510 (avg 7.75s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Oct 9, 2024

✅ Forge suite framework_upgrade success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23 (PR)
Upgrade the nodes to version: 918e4873a3b767bee5054b512671eeaab6b27f23
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1125.34 txn/s, submitted: 1128.46 txn/s, failed submission: 3.12 txn/s, expired: 3.12 txn/s, latency: 2589.79 ms, (p50: 2300 ms, p70: 2700, p90: 4500 ms, p99: 7100 ms), latency samples: 101100
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1001.55 txn/s, submitted: 1004.25 txn/s, failed submission: 2.70 txn/s, expired: 2.70 txn/s, latency: 2947.29 ms, (p50: 2400 ms, p70: 3000, p90: 5400 ms, p99: 6800 ms), latency samples: 89100
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 918e4873a3b767bee5054b512671eeaab6b27f23 passed
Upgrade the remaining nodes to version: 918e4873a3b767bee5054b512671eeaab6b27f23
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1299.73 txn/s, submitted: 1303.39 txn/s, failed submission: 3.67 txn/s, expired: 3.67 txn/s, latency: 2343.34 ms, (p50: 2100 ms, p70: 2400, p90: 3600 ms, p99: 5700 ms), latency samples: 113480
Test Ok

@larry-aptos larry-aptos merged commit 78e9572 into main Oct 9, 2024
89 checks passed
@larry-aptos larry-aptos deleted the 10-02-add_more_logging_and_fix_the_table_info_tar_file_creation branch October 9, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants