Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent re-creation instances when only Configuration metadata or status changes #373

Merged
merged 4 commits into from
Sep 13, 2021

Conversation

ammmze
Copy link
Contributor

@ammmze ammmze commented Sep 4, 2021

What this PR does / why we need it:
Adds a check when a Modified event comes in for a Configuration resource so that we only perform the instance recreation IF the resource actually changed. This allows things like the metadata and status to be updated, but not trigger recreation. There are tools out there like Flux which will update the metadata every time it runs reconciliation with a new git commit (regardless of whether the resource spec has actually changed). This in turn previously would cause the instance(s) to be recreated every time Flux reconciliation occurs with a new git commit.

What this PR does NOT address:
It was also determined in #372 that there is likely a race condition that occurs when multiple instances are being recreated at nearly the same time. This in turn causes some of the instances to get lost. While this still needs to be investigated, the changes made in this PR, should make that race condition less likely to present itself...at least in my scenario.

In addition, while I was testing this in my cluster, I had found some oddness with instances and/or broker pods simply not getting created until I restart the controller, agent, and udev discovery pods. I'm not sure what actually got it to create it. Sometimes restarting the controller would get it to go, other times i had to restart all of them. I'm not sure the cause of this yet, but it does not appear to be related to these changes. I swapped the agent back to 0.6.13 and saw the same behavior. Prior to any of these changes I was running 0.6.5, which I don't recall seeing this behavior. I had to upgrade everything to 0.6.13 to be able to test my agent changes.

Special notes for your reviewer:
I'll have to spend some time getting a ubuntu vm setup to be able to run tests and docs. When I try to run these on my local machine, they error out because the libudev package is only available for Linux.

If applicable:

  • this PR has an associated PR with documentation in akri-docs
  • this PR contains unit tests
  • added code adheres to standard Rust formatting (cargo fmt)
  • code builds properly (cargo build)
  • code is free of common mistakes (cargo clippy)
  • all Akri tests succeed (cargo test)
  • inline documentation builds (cargo doc)
  • version has been updated appropriately (./version.sh)

@ghost
Copy link

ghost commented Sep 4, 2021

CLA assistant check
All CLA requirements met.

@ammmze
Copy link
Contributor Author

ammmze commented Sep 4, 2021

@kate-goldenring Here's my initial go at this! It looks like I'll need to get a VM setup to be able to run the tests and stuff. Because of that I haven't even attempted to write any tests yet. But feel free to look at what i've got thus far and give some pointers to things that should be done differently.

I have successfully tested it in my cluster though...

logs after flux does reconciling, and updates metadata

akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:41:07Z TRACE agent::util::config_action] handle_config - something happened to a configuration
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:41:07Z INFO  agent::util::config_action] handle_config - modified Configuration Some("aeotec-gen-5")
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:41:07Z TRACE agent::util::config_action] should_recreate_config - checking if config aeotec-gen-5 needs to be recreated
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:41:07Z TRACE agent::util::config_action] handle_config - config Some("aeotec-gen-5") has not changed. ignoring config modified event.

logs after making changes to the Configuration spec

akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:43:09Z TRACE agent::util::config_action] handle_config - something happened to a configuration
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:43:09Z INFO  agent::util::config_action] handle_config - modified Configuration Some("aeotec-gen-5")
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:43:09Z TRACE agent::util::config_action] should_recreate_config - checking if config aeotec-gen-5 needs to be recreated
akri-agent-daemonset-7r65w akri-agent [2021-09-04T05:43:09Z TRACE agent::util::config_action] handle_config_delete - for config aeotec-gen-5 telling do_periodic_discovery to end

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Sep 7, 2021

I'll have to spend some time getting a ubuntu vm setup to be able to run tests and docs. When I try to run these on my local machine, they error out because the libudev package is only available for Linux.

libudev needs to be installed for the udev discovery handler. It's dependency can be installed with

apt update
apt install -y libudev-dev

You may also need libssl-dev pkg-config libv4l-dev
We have a script build/setup.sh for installing these and Rust.

@romoh romoh linked an issue Sep 7, 2021 that may be closed by this pull request
@ammmze
Copy link
Contributor Author

ammmze commented Sep 7, 2021

I'll have to spend some time getting a ubuntu vm setup to be able to run tests and docs. When I try to run these on my local machine, they error out because the libudev package is only available for Linux.

libudev needs to be installed for the udev discovery handler. It's dependency can be installed with

apt update
apt install -y libudev-dev

You may also need libssl-dev pkg-config libv4l-dev
We have a script build/setup.sh for installing these and Rust.

Good to know! I'll use these when I go to setup my VM. Actually, I wanna try doing it in the vscode devcontainer thing first and see what happens. Not sure if we've attempted the vscode dev containers before.

…r status changes

chore: updates from PR comments

build: fix clippy warning

test: added tests for should_recreate_config method
.gitignore Show resolved Hide resolved
@ammmze ammmze requested a review from bfjelds September 10, 2021 18:10
Copy link
Contributor

@kate-goldenring kate-goldenring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these changes and some great tests!

@kate-goldenring kate-goldenring merged commit 6b477da into project-akri:main Sep 13, 2021
@ammmze ammmze deleted the config-generation branch September 13, 2021 16:07
vincepnguyen pushed a commit that referenced this pull request Nov 23, 2021
…r status changes (#373)

* chore: bump version to 0.6.14

* fix: prevent re-creation instances when only Configuration metadata or status changes

chore: updates from PR comments

build: fix clippy warning

test: added tests for should_recreate_config method

* test: dry up new tests

* fix: only accept newer generations when checking modified configuration
Signed-off-by: vincepnguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instance + Broker Pod disappears
4 participants