Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add support for lazyfs #14691

Merged
merged 1 commit into from
Jul 27, 2023
Merged

tests: Add support for lazyfs #14691

merged 1 commit into from
Jul 27, 2023

Conversation

serathius
Copy link
Member

@ptabor
Copy link
Contributor

ptabor commented Nov 17, 2022

@serathius: Is it still intentionally draft ?

@serathius
Copy link
Member Author

Yes, I only implemented building lazyfs. I still need to configure it and implement cache clear on crash.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2022

Codecov Report

Merging #14691 (08cb83a) into main (08cb83a) will not change coverage.
The diff coverage is n/a.

❗ Current head 08cb83a differs from pull request most recent head 3ecf291. Consider uploading reports for the commit 3ecf291 to get more accurate results

@@           Coverage Diff           @@
##             main   #14691   +/-   ##
=======================================
  Coverage   75.50%   75.50%           
=======================================
  Files         457      457           
  Lines       37423    37423           
=======================================
  Hits        28257    28257           
  Misses       7387     7387           
  Partials     1779     1779           
Flag Coverage Δ
all 75.50% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 18, 2023
@serathius
Copy link
Member Author

Still want to revisit this. Problem is integrating lazyfs in a user approachable way. We need to run separate lazyfs process for each etcd instance. It might require non-trivial rewrite.

@stale stale bot removed the stale label Mar 18, 2023
@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@serathius
Copy link
Member Author

PR is ready to review. There is one concern on my side. LazyFS uses a lot of CPU, in environment with limited resources like CI it might be too much. Will need to validates if it doesn't cause flakes.

cc @ahrtr @ptabor @jmhbnz @fuweid @chaochn47

@serathius serathius force-pushed the lazyfs branch 2 times, most recently from 8444e97 to db751e4 Compare July 18, 2023 12:11
@fuweid
Copy link
Member

fuweid commented Jul 20, 2023

LazyFS uses a lot of CPU, in environment with limited resources like CI it might be too much.

I run this branch in my local with 3 cores. The failure is about traffic.go:91: Requiring minimal 200.000000 qps for test results to be reliable, got 147.115802 qps. I did see the lazyfs uses more CPU, which might cause the flake

@serathius
Copy link
Member Author

Need to think how to integrate the lazyfs into the test scenarios. Should we lower the qps requirement, run separate scenarios with lower qps, or only enable on LowTraffic. Let me know if you have any suggestions.

@fuweid
Copy link
Member

fuweid commented Jul 20, 2023

run separate scenarios with lower qps, or only enable on LowTraffic

I am trying to enable grpc-only endpoint for the test. update it later.

@fuweid
Copy link
Member

fuweid commented Jul 25, 2023

@serathius My devbox is kind of similar to github action vm. I run the branch with 3 CPU cores. I think the HighTraffic.minQPS should be changed to 120 if enableLazyfs is true. I was trying to use different block size but it doesn't help. The fsync operation might require more CPU based on my testing.

image

BTW, compactAfterCommitBatch becomes flaky after enable lazyfs. After trigger compact, the total number of keys isn't more enough to trigger batch commit. Hope that information is help.

@fuweid
Copy link
Member

fuweid commented Jul 25, 2023

I didn't see any qps issue in LowTraffic or Kubernetes profiles. For the change, it looks good to me.

@serathius serathius force-pushed the lazyfs branch 2 times, most recently from c83cd17 to dafbaf1 Compare July 25, 2023 09:08
@fuweid
Copy link
Member

fuweid commented Jul 25, 2023

2023-07-25T09:21:06.6384677Z     logger.go:130: 2023-07-25T09:21:04.266Z	INFO	Triggering failpoint	{"failpoint": "compactBeforeCommitBatch"}
2023-07-25T09:21:06.6385599Z     logger.go:130: 2023-07-25T09:21:04.266Z	INFO	Setting up gofailpoint	{"failpoint": "compactBeforeCommitBatch"}
2023-07-25T09:21:06.6386514Z     logger.go:130: 2023-07-25T09:21:04.272Z	INFO	Triggering gofailpoint	{"failpoint": "compactBeforeCommitBatch"}
2023-07-25T09:21:06.6387728Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesLowTrafficClusterOfSize1LazyFS-test-0) (11989): {"level":"info","ts":"2023-07-25T09:21:04.390478Z","caller":"mvcc/index.go:194","msg":"compact tree index","revision":73}
2023-07-25T09:21:06.6389417Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesLowTrafficClusterOfSize1LazyFS-test-0) (11989): {"level":"info","ts":"2023-07-25T09:21:04.390729Z","caller":"mvcc/kvstore_compaction.go:69","msg":"finished scheduled compaction","compact-revision":73,"took":"177.702µs","hash":2606726685}
2023-07-25T09:21:06.6391083Z /home/runner/work/etcd/etcd/bin/etcd (TestRobustnessKubernetesLowTrafficClusterOfSize1LazyFS-test-0) (11989): {"level":"info","ts":"2023-07-25T09:21:04.390754Z","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2606726685,"revision":73,"compact-revision":-1}
2023-07-25T09:21:06.6392842Z     logger.go:130: 2023-07-25T09:21:04.391Z	INFO	Waiting for member to exit	{"member": "TestRobustnessKubernetesLowTrafficClusterOfSize1LazyFS-test-0"}

That is flaky case I mentioned. The compact doesn't commit after enable failpoint.

REF: https://github.com/etcd-io/etcd/actions/runs/5654876623/job/15318829070?pr=14691

@fuweid
Copy link
Member

fuweid commented Jul 27, 2023

stderr: # go.etcd.io/etcd/tests/v3/framework/e2e
Error: stderr: framework/e2e/cluster.go:415:34: too many arguments in call to NewEtcdProcess
stderr: have (testing.TB, *EtcdServerProcessConfig)
stderr: want (*EtcdServerProcessConfig)
Error: stderr: framework/e2e/cluster.go:835:34: too many arguments in call to NewEtcdProcess
stderr: have (testing.TB, *EtcdServerProcessConfig)
stderr: want (*EtcdServerProcessConfig)
Error: stderr: framework/e2e/cluster.go:864:34: too many arguments in call to NewEtcdProcess
stderr: have (testing.TB, *EtcdServerProcessConfig)
stderr: want (*EtcdServerProcessConfig)
Error: stderr: framework/e2e/cluster_proxy.go:43:9: cannot use NewProxyEtcdProcess(cfg) (value of type *proxyEtcdProcess) as type EtcdProcess in return statement:
stderr: *proxyEtcdProcess does not implement EtcdProcess (missing LazyFS method)
Error: stderr: framework/e2e/cluster_proxy.go:47:34: not enough arguments in call to NewEtcdServerProcess
stderr: have (*EtcdServerProcessConfig)
stderr: want (testing.TB, *EtcdServerProcessConfig)

need to support proxy part.

@serathius
Copy link
Member Author

serathius commented Jul 27, 2023

need to support proxy part.

I know :(. I always neglect cluster_proxy as it's inconvenient to run locally.

@serathius
Copy link
Member Author

serathius commented Jul 27, 2023

PR passed tests. We have a working automatic disk failure injection testing! Same as Jepsen

Please take a look @fuweid @chaochn47 @jmhbnz @ahrtr

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work getting this going team!

Some feedback below, nothing too major.

Makefile Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
tests/framework/e2e/etcd_process.go Outdated Show resolved Hide resolved
tests/framework/e2e/lazyfs.go Outdated Show resolved Hide resolved
tests/framework/e2e/lazyfs.go Outdated Show resolved Hide resolved
tests/framework/e2e/lazyfs.go Show resolved Hide resolved
Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@serathius serathius force-pushed the lazyfs branch 2 times, most recently from 8003036 to 8dd863c Compare July 27, 2023 09:57
Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Marek Siarkowicz <[email protected]>
@serathius serathius merged commit 9f72c64 into etcd-io:main Jul 27, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants