Skip to content

Conversation

@eribeiro
Copy link
Contributor

Add Byteman (http://byteman.jboss.org/) as a dependency to ZooKeeper project to enable the dynamic injection of failures.

@eribeiro eribeiro changed the title Add orthogonal fault injection mechanism/framework ZOOKEEPER-1364: Add orthogonal fault injection mechanism/framework Dec 13, 2016
@eribeiro
Copy link
Contributor Author

@yufeldman I really like your ZOOKEEPER-2549 patch, but I think it would be an awesome opportunity of introducing a much needed failure injection framework into ZooKeeper. So, I wrote this patch with your changes as example to see what is your opinion and if we could backport those changes to your PR or at least discuss the introduction of Byteman. IMHO, it would save a lot of changes due by ZK-2549, but I am really open to hear your opinion and concerns. Cheers! 😃

@yufeldman
Copy link

@eribeiro I am not sure that critical bug is the place to introduce injection framework to Zookeeper. Can't it be done as a separate JIRA?

@eribeiro
Copy link
Contributor Author

@yufeldman Sure, totally agree about this being a separate issue! 👍

My only concern is that IF we decide to introduce this injection framework into ZK after merging ZOOKEEPER-2549 then some of the mocking and reflection changes introduced by ZOOKEEPER-2549 for the purposes of testing become redundant, so basically we would have to undo parts of it (but not the bug fix, of course). See this PR for a concrete idea of what I am trying to say.

But you raised a very important point: ZOOKEEPER-2549 is critical, and delaying its merging is gonna be bad for the project release, IMHO. So, I am all for you keep going your work (congrats, btw) and run this as a future issue. Or whatever the committers find more suitable. /cc @fpj @rgs1 @hanm @phunt

@hanm
Copy link
Contributor

hanm commented Dec 15, 2016

Good initiative, @eribeiro! This will be a valuable addition to the project. Please keep driving this forward.

Regarding dependency between ZOOKEEPER-2549 and ZOOKEEPER-1364, the mock stuff ZOOKEEPER-2549 introduced is localized so it shouldn't take a lot effort to replace that with what to be done in ZOOKEEPER-1364. Since ZOOKEEPER-2549 is in relative good shape to be merged in my opinion here is to get ZOOKEEPER-2549 in first then start iterating on ZOOKEEPER-1364.

@anmolnar
Copy link
Contributor

@eribeiro This is still a very important initiative to improve ZooKeeper testing. Are u still interested in working on this?

@maoling
Copy link
Member

maoling commented Dec 27, 2019

ping @eribeiro @yufeldman
Are you free to refresh this patch? If you don't reply me within ten days, I will pick it up. Of course, the new patch's commit message will also be signed by your name.

@yufeldman
Copy link

yufeldman commented Dec 27, 2019 via email

@asf-ci
Copy link

asf-ci commented Dec 27, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1764/

@eribeiro
Copy link
Contributor Author

@maoling @yufeldman Sure, no problem you picking up this issue, maoling. :) This is a pretty nice opportunity to see this merged to ZK codebase. Please, count on me to any help/discussion about it, btw. Cheers!

@maoling
Copy link
Member

maoling commented Jan 14, 2020

@eribeiro @yufeldman I take a close look at this issue today, I guess this issue had been fixed

  • org.apache.zookeeper.server.ServerCnxn#serializeRecord it now throws an IOException
  • To recheck this issue, I inject a fault into that code:
    RULE trace zk_nio_exception_injects_faults
    CLASS org.apache.zookeeper.server.ServerCnxn
    METHOD serializeRecord
    IF true
    DO
      traceln("*** zk nio exception injects faults***");
      throw new IOException()

don't have a hang phenomenon, the client got the following:

Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 20002ms for session id 0x1000222974b0001
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230)
KeeperErrorCode = ConnectionLoss for /01-14-7
WATCHER::

@anmolnar
Copy link
Contributor

@maoling Are u working on this then?
Can I close this PR?

asfgit pushed a commit that referenced this pull request May 29, 2021
…ZooKeeper

- **Byteman** is a powerful tool to inject faults during runtime, especially for distributed system
- In the future, we can also introduce it into our unit test infra, just as hadoop, cassandra did. [PR-123](#123) is a good starting.
- more details in the [ZOOKEEPER-3601](https://issues.apache.org/jira/browse/ZOOKEEPER-3601)

Author: maoling <[email protected]>

Reviewers: Andor Molnar <[email protected]>

Closes #1135 from maoling/ZOOKEEPER-3601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants