Skip to content

Conversation

@joegallo
Copy link
Contributor

@joegallo joegallo commented Mar 24, 2022

Follow up to #84588, which was itself a follow up to #82316

I wanted this assert in there out of an abundance of caution, but indeed the other change in #84588 to PolicyStepsRegistryTests was a fair test of the intended behavior change on that PR.

This PR moves the stress test to actual test code. I've run it with -Dtests.iters=10000 and it's fine (with the assert remaining it fails the sanity check (returning null rather than the expected step) in the neighborhood of 20% of the time as measured with -Dtests.iters=100). The test takes about a tenth of a second to run on a warmed up jvm on my box.

Closes #84979
Closes #85036
Closes #85075
Closes #85175

It doesn't work as written because it expects the cachedSteps map to
be the same between the call to the put and the call to getCachedStep
-- but of course another thread could have put a new policy, which
results in PolicyStepsRegistry#update being called and the cache being
cleared (in which case the second call returns null).
@joegallo joegallo added >test Issues or PRs that are addressing/adding tests :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.2.0 v8.0.2 v7.17.2 v8.1.2 labels Mar 24, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Mar 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo joegallo requested review from andreidan and dakrone March 24, 2022 20:02
@joegallo joegallo removed the v8.0.2 label Mar 24, 2022
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this Joe

}
} finally {
// tell the other thread we're finished
done.set(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit, but in the event this is the very last test run in a suite (or it's just run by itself), it's possible our thread leak detector will complain since we aren't doing a thread.join() on the thread we spawn before to wait for it to be completely done. Do you think we should add that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I'll get that in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joegallo joegallo force-pushed the remove-failing-sanity-check branch from 8a0f7ad to 0b4e18c Compare March 24, 2022 20:54
Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this Joe

LifecycleAction action = randomValueOtherThan(MigrateAction.DISABLED, () -> randomFrom(phase.getActions().values()));
Step step = randomFrom(action.toSteps(client, phaseName, MOCK_STEP_KEY, null));
Step actualStep = registry.getStep(indexMetadata, step.getKey());
assertThat(actualStep.getKey(), equalTo(step.getKey()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused as to what this tests. There'll be a new step added, by-passing the cache, on every iteration - I'm quite confused specifically as we don't have any steps defined/registered before, yet we do setup some metadatas (both IndexMetadata and IndexLifecycleMetadata). Apologies if I'm missing something very obvious but could we document the intent here? (I'm guessing we want to populate the cache? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough -- I spent a bit of time stressing over this exact point. It's a little bit tricky, I don't think you're missing something obvious. I'll add some comments and let's see where that gets us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arteam arteam added v7.17.3 and removed v7.17.2 labels Mar 28, 2022
@arteam arteam added v8.1.3 and removed v8.1.2 labels Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team >test Issues or PRs that are addressing/adding tests v7.17.3 v8.1.2 v8.2.0

Projects

None yet

5 participants