Skip to content
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,17 @@ public ClusterState performAction(Index index, ClusterState clusterState) {

IndexMetaData.Builder indexMetadataBuilder = IndexMetaData.builder(indexMetaData);
if (shouldParseIndexName(indexMetaData.getSettings())) {
long parsedOriginationDate = parseIndexNameAndExtractDate(index.getName());
indexMetadataBuilder.settingsVersion(indexMetaData.getSettingsVersion() + 1)
.settings(Settings.builder()
.put(indexMetaData.getSettings())
.put(LifecycleSettings.LIFECYCLE_ORIGINATION_DATE, parsedOriginationDate)
.build()
);
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to move the try to surround more of the function (for example, the fromIndexMetadata(...) call

long parsedOriginationDate = parseIndexNameAndExtractDate(index.getName());
indexMetadataBuilder.settingsVersion(indexMetaData.getSettingsVersion() + 1)
.settings(Settings.builder()
.put(indexMetaData.getSettings())
.put(LifecycleSettings.LIFECYCLE_ORIGINATION_DATE, parsedOriginationDate)
.build()
);
} catch (Exception e) {
throw new InitializePolicyException(e.getMessage(), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for wrapping this in an ElasticsearchException type exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with using an exception for flow control but since we don't have any lifecycle state yet (as we're just initialising the policy) I used this exception to signal the current failing step is the "init policy step" when moving the policy into the ERROR step.

Another option would be to pass the current step key to IndexLifecycleTransition.moveClusterStateToErrorStep but this seemed more intrusive.

}
}

ClusterState.Builder newClusterStateBuilder = ClusterState.builder(clusterState);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/
package org.elasticsearch.xpack.core.ilm;

import org.elasticsearch.ElasticsearchException;

/**
* Exception thrown when a problem is encountered while initialising an ILM policy for an index.
*/
public class InitializePolicyException extends ElasticsearchException {

public InitializePolicyException(String msg, Throwable cause, Object... args) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include the policy name in the exception somewhere? I think that might be helpful if the setting were to change but the error was still there

super(msg, cause, args);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1120,26 +1120,26 @@ public void testRolloverStepRetriesUntilRolledOverIndexIsDeleted() throws Except
// {@link org.elasticsearch.xpack.core.ilm.ErrorStep} in order to retry the failing step. As {@link #assertBusy}
// increases the wait time between calls exponentially, we might miss the window where the policy is on
// {@link WaitForRolloverReadyStep} and the move to `attempt-rollover` request will not be successful.
waitUntil(() -> {
assertThat(waitUntil(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider using assertTrue, we have long body here, it's hard to tell what we asserting here. The same applies to all instances below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be changed to just assertBusy(() -> {...}, 30, TimeUnit.SECONDS) also and use assertions to signal success/failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the assertThat(waitUntil(), is(true)) to assertTrue(waitUntil()) but the reason to use waitUntil as opposed to assertBusy is to avoid having this test flake. We are asserting that the failed step and the retry count are bound to particular values. The fact that we want to check the failed step is what we expect it to be means we have to catch ILM in the ERROR step, but with retryable steps ILM keeps moving back and forth between the failing step (on retry) and the ERROR step (when the step execution fails).

assertBusy increases the wait time exponentially once it gets to 1 second, while waitUntil keeps it at 1 second. This means we're bound to catch ILM in the ERROR step using waitUntil (but with assertBusy we probe the ILM state only at seconds 1, 2, 4, 8 and 16)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, I see the difference between them now, in that case I agree that assertTrue would be better than assertThat in this case

try {
return client().performRequest(moveToStepRequest).getStatusLine().getStatusCode() == 200;
} catch (IOException e) {
return false;
}
}, 30, TimeUnit.SECONDS);
}, 30, TimeUnit.SECONDS), is(true));

// Similar to above, using {@link #waitUntil} as we want to make sure the `attempt-rollover` step started failing and is being
// retried (which means ILM moves back and forth between the `attempt-rollover` step and the `error` step)
waitUntil(() -> {
assertThat("ILM did not start retrying the attempt-rollover step", waitUntil(() -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assertThat -> waitUntil -> is(true) is a little hard to follow, I think just a single assertBusy() would be better, since you can add the assertions in the body itself rather than returning a boolean?

try {
Map<String, Object> explainIndexResponse = explainIndex(index);
String step = (String) explainIndexResponse.get("step");
String failedStep = (String) explainIndexResponse.get("failed_step");
Integer retryCount = (Integer) explainIndexResponse.get(FAILED_STEP_RETRY_COUNT_FIELD);
return step != null && step.equals("attempt-rollover") && retryCount != null && retryCount >= 1;
return failedStep != null && failedStep.equals("attempt-rollover") && retryCount != null && retryCount >= 1;
} catch (IOException e) {
return false;
}
}, 30, TimeUnit.SECONDS);
}, 30, TimeUnit.SECONDS), is(true));

deleteIndex(rolledIndex);

Expand Down Expand Up @@ -1181,16 +1181,17 @@ public void testUpdateRolloverLifecycleDateStepRetriesWhenRolloverInfoIsMissing(
"}");
client().performRequest(moveToStepRequest);

waitUntil(() -> {
assertThat("ILM did not start retrying the update-rollover-lifecycle-date step", waitUntil(() -> {
try {
Map<String, Object> explainIndexResponse = explainIndex(index);
String step = (String) explainIndexResponse.get("step");
String failedStep = (String) explainIndexResponse.get("failed_step");
Integer retryCount = (Integer) explainIndexResponse.get(FAILED_STEP_RETRY_COUNT_FIELD);
return step != null && step.equals(UpdateRolloverLifecycleDateStep.NAME) && retryCount != null && retryCount >= 1;
return failedStep != null && failedStep.equals(UpdateRolloverLifecycleDateStep.NAME) && retryCount != null
&& retryCount >= 1;
} catch (IOException e) {
return false;
}
});
}, 30, TimeUnit.SECONDS), is(true));

index(client(), index, "1", "foo", "bar");
Request refreshIndex = new Request("POST", "/" + index + "/_refresh");
Expand Down Expand Up @@ -1376,16 +1377,17 @@ public void testRetryableInitializationStep() throws Exception {
assertOK(client().performRequest(startReq));

// Wait until an error has occurred.
waitUntil(() -> {
assertThat("ILM did not start retrying the init step", waitUntil(() -> {
try {
Map<String, Object> explainIndexResponse = explainIndex(index);
String step = (String) explainIndexResponse.get("step");
String failedStep = (String) explainIndexResponse.get("failed_step");
Integer retryCount = (Integer) explainIndexResponse.get(FAILED_STEP_RETRY_COUNT_FIELD);
return step != null && step.equals(InitializePolicyContextStep.KEY.getAction()) && retryCount != null && retryCount >= 1;
return failedStep != null && failedStep.equals(InitializePolicyContextStep.KEY.getAction()) && retryCount != null
&& retryCount >= 1;
} catch (IOException e) {
return false;
}
}, 30, TimeUnit.SECONDS);
}, 30, TimeUnit.SECONDS), is(true));

// Turn origination date parsing back off
updateIndexSettings(index, Settings.builder()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.elasticsearch.xpack.core.ilm.ErrorStep;
import org.elasticsearch.xpack.core.ilm.IndexLifecycleMetadata;
import org.elasticsearch.xpack.core.ilm.InitializePolicyContextStep;
import org.elasticsearch.xpack.core.ilm.InitializePolicyException;
import org.elasticsearch.xpack.core.ilm.LifecycleExecutionState;
import org.elasticsearch.xpack.core.ilm.LifecyclePolicyMetadata;
import org.elasticsearch.xpack.core.ilm.LifecycleSettings;
Expand Down Expand Up @@ -133,8 +134,13 @@ static ClusterState moveClusterStateToErrorStep(Index index, ClusterState cluste
ElasticsearchException.generateThrowableXContent(causeXContentBuilder, STACKTRACE_PARAMS, cause);
causeXContentBuilder.endObject();
LifecycleExecutionState currentState = LifecycleExecutionState.fromIndexMetadata(idxMeta);
Step.StepKey currentStep = Objects.requireNonNull(LifecycleExecutionState.getCurrentStepKey(currentState),
"unable to move to an error step where there is no current step, state: " + currentState);
Step.StepKey currentStep;
if (cause instanceof InitializePolicyException) {
currentStep = InitializePolicyContextStep.KEY;
} else {
currentStep = Objects.requireNonNull(LifecycleExecutionState.getCurrentStepKey(currentState),
"unable to move to an error step where there is no current step, state: " + currentState);
}
LifecycleExecutionState nextStepState = updateExecutionStateToStep(policyMetadata, currentState,
new Step.StepKey(currentStep.getPhase(), currentStep.getAction(), ErrorStep.NAME), nowSupplier, false);

Expand Down