Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DecommissionService and helper to execute awareness attribute decommissioning #4084

Merged
merged 116 commits into from
Sep 25, 2022
Merged
Show file tree
Hide file tree
Changes from 115 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
346f194
Add Executor to decommission node attribute
imRishN Aug 2, 2022
4ad7ad1
Add DecommissionHelper
imRishN Aug 8, 2022
223aceb
Decommission service implementation with metadata
imRishN Aug 17, 2022
65b2c0c
Fixes
imRishN Aug 17, 2022
5cc0a08
Master abdication
imRishN Aug 17, 2022
2ce51e7
Fixes
imRishN Aug 17, 2022
9c7c318
Update join validator to validate decommissioned node join request
imRishN Aug 17, 2022
356061c
Clear voting config after decommissioning
imRishN Aug 18, 2022
a9aaabf
Resolving comments
imRishN Aug 23, 2022
2da5c3f
Fixes
imRishN Aug 23, 2022
36cb1fd
Fixes
imRishN Aug 23, 2022
315a510
Some refactpring
imRishN Aug 23, 2022
3cd8116
Updates
imRishN Aug 23, 2022
84e4fc8
Fix to abdication
imRishN Aug 23, 2022
376da0d
Remove cluster state variable from service
imRishN Aug 23, 2022
333eb86
Log node string
imRishN Aug 23, 2022
6040ef0
Fix conflict
imRishN Aug 24, 2022
409ab6f
Changes in Service
imRishN Aug 24, 2022
e99058a
Fix spotless check
imRishN Aug 24, 2022
4053e2c
Update the join validator for decommissioned attribute
imRishN Aug 24, 2022
6476581
Add UTs for metadata
imRishN Aug 24, 2022
a21b238
Add UTs for JoinTaskExecutor changes
imRishN Aug 24, 2022
8d6e485
Fix
imRishN Aug 24, 2022
09d6417
Test files
imRishN Aug 25, 2022
01679f8
Move observer logic to helper
imRishN Aug 25, 2022
465c15e
fix msg
imRishN Aug 25, 2022
9107861
Move predicate to helper
imRishN Aug 26, 2022
26df730
test
imRishN Aug 29, 2022
a86d138
Add UT
imRishN Aug 29, 2022
27564d9
Add UT for DecommissionController
imRishN Aug 29, 2022
1cc01ae
Improvements and UTs
imRishN Aug 30, 2022
9558bfd
Add UT
imRishN Aug 30, 2022
870190d
Fix decommission initiation
imRishN Aug 30, 2022
2ada94c
Changes
imRishN Aug 30, 2022
6537dc6
Move DecommissionAttributeMetadata to decommission package
imRishN Aug 30, 2022
1fb0b0a
Update exception name
imRishN Aug 30, 2022
9f39b5c
Fix spotless and precommit checks
imRishN Aug 30, 2022
f9e602b
Update enum
imRishN Aug 30, 2022
4eb5a9c
Fix spotless and precommit checks
imRishN Aug 30, 2022
7a9e843
Add package-info and Changelog
imRishN Aug 30, 2022
3b985cf
Add checks for quorum
imRishN Sep 1, 2022
f88cccb
Bug fix
imRishN Sep 1, 2022
fe717dd
Resolving PR comments
imRishN Sep 2, 2022
67cf701
Update awareness attribute decommission status check
imRishN Sep 6, 2022
68a1c64
Update quorum loss check logic
imRishN Sep 6, 2022
157fd54
Update status assertion and clear voting config for failed init
imRishN Sep 6, 2022
c7c5efe
Refactoring
imRishN Sep 6, 2022
1d67fcc
Fix spotless check
imRishN Sep 6, 2022
0a35445
Resolve comments
imRishN Sep 6, 2022
54c21d3
Fix spotless check
imRishN Sep 6, 2022
0621be7
Updating states and flow
imRishN Sep 7, 2022
cef61b8
Trigger exclusion after init
imRishN Sep 7, 2022
e12a73d
Updates
imRishN Sep 7, 2022
f1866c7
Resolving comments
imRishN Sep 7, 2022
49af3e4
Fixes
imRishN Sep 7, 2022
a5a3b8d
Fix spotless check
imRishN Sep 7, 2022
d2fe876
Resolve comments
imRishN Sep 8, 2022
12247c0
Precheck for retry
imRishN Sep 8, 2022
48101a1
Add logging
imRishN Sep 8, 2022
fb508c1
Fix spotless check
imRishN Sep 8, 2022
d5f6ac0
Fix controller tests
imRishN Sep 9, 2022
ee407de
Fix Decommission Service test
imRishN Sep 9, 2022
ccde356
Fix spotless check
imRishN Sep 9, 2022
cba7670
Empty-Commit
imRishN Sep 9, 2022
96ee70d
Add getHistoryOperationsFromTranslog method to fetch the history snap…
ankitkala Sep 13, 2022
860d148
Add package-info and Changelog
imRishN Aug 30, 2022
2451078
Empty-Commit
imRishN Sep 9, 2022
65414c1
Address Comments
imRishN Sep 13, 2022
add1786
Fix tests
imRishN Sep 13, 2022
b15b9ff
Fix spotless check
imRishN Sep 13, 2022
f9ba08d
Update logic for exclusion response
imRishN Sep 16, 2022
251d81f
Update Changelog
imRishN Sep 16, 2022
aa5d41b
Addressing minor comments
imRishN Sep 16, 2022
c271693
Update request eligibility check
imRishN Sep 16, 2022
b0a35da
Update metadata usage
imRishN Sep 16, 2022
0f9cdba
Remove fromStage method and update withUpdatedStatus method in metadata
imRishN Sep 16, 2022
a738ab2
Fix spotless check
imRishN Sep 16, 2022
081f52a
Add observer to ensure abdication
imRishN Sep 16, 2022
fa8dafc
Refactor node removal observer
imRishN Sep 16, 2022
30c9415
Fix spotless check
imRishN Sep 16, 2022
6000243
Update state transistions
imRishN Sep 19, 2022
68943b1
Small fixes
imRishN Sep 19, 2022
ee528e7
Fixes
imRishN Sep 19, 2022
72a20f0
Fix spotless check
imRishN Sep 19, 2022
87ebc18
Refactor setStatus
imRishN Sep 20, 2022
c25aab1
Minor changes
imRishN Sep 20, 2022
93afa98
Refactor leader decommission
imRishN Sep 21, 2022
6aacc5c
Add Executor to decommission node attribute
imRishN Aug 2, 2022
5842746
Add DecommissionHelper
imRishN Aug 8, 2022
bf7e1a6
Decommission service implementation with metadata
imRishN Aug 17, 2022
6711c95
Fixes
imRishN Aug 17, 2022
a39a52e
Resolving comments
imRishN Aug 23, 2022
d1392f1
Fixes
imRishN Aug 23, 2022
881664b
Fixes
imRishN Aug 23, 2022
e7701f7
Add UTs for metadata
imRishN Aug 24, 2022
40af043
Test files
imRishN Aug 25, 2022
0c123e2
Move observer logic to helper
imRishN Aug 25, 2022
b28c96a
Move predicate to helper
imRishN Aug 26, 2022
8c62bb0
test
imRishN Aug 29, 2022
7e6f5eb
Improvements and UTs
imRishN Aug 30, 2022
f77496a
Move DecommissionAttributeMetadata to decommission package
imRishN Aug 30, 2022
bf7c2ec
Update exception name
imRishN Aug 30, 2022
377c6be
Fix spotless check
imRishN Sep 6, 2022
7b03b3e
Resolving comments
imRishN Sep 7, 2022
9a2b11c
Empty-Commit
imRishN Sep 9, 2022
c144db6
Add getHistoryOperationsFromTranslog method to fetch the history snap…
ankitkala Sep 13, 2022
ca17fed
Add package-info and Changelog
imRishN Aug 30, 2022
c3b512b
Empty-Commit
imRishN Sep 9, 2022
e3b1eb9
Update Changelog
imRishN Sep 16, 2022
7479b40
Fixes
imRishN Sep 19, 2022
dc58265
Refactor setStatus
imRishN Sep 20, 2022
890eef0
Minor changes
imRishN Sep 20, 2022
5ef1348
Fix spotless check
imRishN Sep 21, 2022
4921a79
Fix spotless check
imRishN Sep 21, 2022
b4de2c7
Minor fix
imRishN Sep 21, 2022
c2b6e38
Merge branch 'main' into decommission/pr
imRishN Sep 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- [Segment Replication] Update replicas to commit SegmentInfos instead of relying on SIS files from primary shards. ([#4402](https://github.com/opensearch-project/OpenSearch/pull/4402))
- [CCR] Add getHistoryOperationsFromTranslog method to fetch the history snapshot from translogs ([#3948](https://github.com/opensearch-project/OpenSearch/pull/3948))
- [Remote Store] Change behaviour in replica recovery for remote translog enabled indices ([#4318](https://github.com/opensearch-project/OpenSearch/pull/4318))
- Add DecommissionService and helper to execute awareness attribute decommissioning ([#4084](https://github.com/opensearch-project/OpenSearch/pull/4084))

### Deprecated

Expand Down Expand Up @@ -102,4 +103,4 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)


[Unreleased]: https://github.com/opensearch-project/OpenSearch/compare/2.2.0...HEAD
[2.x]: https://github.com/opensearch-project/OpenSearch/compare/2.2.0...2.x
[2.x]: https://github.com/opensearch-project/OpenSearch/compare/2.2.0...2.x
12 changes: 12 additions & 0 deletions server/src/main/java/org/opensearch/OpenSearchException.java
Original file line number Diff line number Diff line change
Expand Up @@ -1608,6 +1608,18 @@ private enum OpenSearchExceptionHandle {
org.opensearch.index.shard.PrimaryShardClosedException::new,
162,
V_3_0_0
),
DECOMMISSIONING_FAILED_EXCEPTION(
org.opensearch.cluster.decommission.DecommissioningFailedException.class,
org.opensearch.cluster.decommission.DecommissioningFailedException::new,
163,
V_3_0_0
Comment on lines +1612 to +1616
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DECOMMISSION_FAILED_EXCEPTION

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it DECOMMISSION_FAILED_EXCEPTION but @shwetathareja suggested to change it to DECOMMISSIONING_FAILED_EXCEPTION

),
NODE_DECOMMISSIONED_EXCEPTION(
org.opensearch.cluster.decommission.NodeDecommissionedException.class,
org.opensearch.cluster.decommission.NodeDecommissionedException::new,
164,
V_3_0_0
imRishN marked this conversation as resolved.
Show resolved Hide resolved
);

final Class<? extends OpenSearchException> exceptionClass;
Expand Down
14 changes: 14 additions & 0 deletions server/src/main/java/org/opensearch/cluster/ClusterModule.java
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import org.opensearch.cluster.action.index.MappingUpdatedAction;
import org.opensearch.cluster.action.index.NodeMappingRefreshAction;
import org.opensearch.cluster.action.shard.ShardStateAction;
import org.opensearch.cluster.decommission.DecommissionAttributeMetadata;
import org.opensearch.cluster.metadata.ComponentTemplateMetadata;
import org.opensearch.cluster.metadata.ComposableIndexTemplateMetadata;
import org.opensearch.cluster.metadata.DataStreamMetadata;
Expand Down Expand Up @@ -193,6 +194,12 @@ public static List<Entry> getNamedWriteables() {
);
registerMetadataCustom(entries, DataStreamMetadata.TYPE, DataStreamMetadata::new, DataStreamMetadata::readDiffFrom);
registerMetadataCustom(entries, WeightedRoutingMetadata.TYPE, WeightedRoutingMetadata::new, WeightedRoutingMetadata::readDiffFrom);
registerMetadataCustom(
entries,
DecommissionAttributeMetadata.TYPE,
DecommissionAttributeMetadata::new,
DecommissionAttributeMetadata::readDiffFrom
);
// Task Status (not Diffable)
entries.add(new Entry(Task.Status.class, PersistentTasksNodeService.Status.NAME, PersistentTasksNodeService.Status::new));
return entries;
Expand Down Expand Up @@ -283,6 +290,13 @@ public static List<NamedXContentRegistry.Entry> getNamedXWriteables() {
WeightedRoutingMetadata::fromXContent
)
);
entries.add(
new NamedXContentRegistry.Entry(
Metadata.Custom.class,
new ParseField(DecommissionAttributeMetadata.TYPE),
DecommissionAttributeMetadata::fromXContent
)
);
return entries;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
import org.opensearch.cluster.ClusterStateTaskExecutor;
import org.opensearch.cluster.NotClusterManagerException;
import org.opensearch.cluster.block.ClusterBlocks;
import org.opensearch.cluster.decommission.DecommissionAttribute;
import org.opensearch.cluster.decommission.DecommissionAttributeMetadata;
import org.opensearch.cluster.decommission.DecommissionStatus;
import org.opensearch.cluster.decommission.NodeDecommissionedException;
import org.opensearch.cluster.metadata.IndexMetadata;
import org.opensearch.cluster.metadata.Metadata;
import org.opensearch.cluster.node.DiscoveryNode;
Expand Down Expand Up @@ -358,6 +362,7 @@ public boolean runOnlyOnClusterManager() {

/**
* a task indicates that the current node should become master
*
* @deprecated As of 2.0, because supporting inclusive language, replaced by {@link #newBecomeClusterManagerTask()}
*/
@Deprecated
Expand All @@ -384,8 +389,9 @@ public static Task newFinishElectionTask() {
* Ensures that all indices are compatible with the given node version. This will ensure that all indices in the given metadata
* will not be created with a newer version of opensearch as well as that all indices are newer or equal to the minimum index
* compatibility version.
* @see Version#minimumIndexCompatibilityVersion()
*
* @throws IllegalStateException if any index is incompatible with the given version
* @see Version#minimumIndexCompatibilityVersion()
*/
public static void ensureIndexCompatibility(final Version nodeVersion, Metadata metadata) {
Version supportedIndexVersion = nodeVersion.minimumIndexCompatibilityVersion();
Expand Down Expand Up @@ -415,14 +421,18 @@ public static void ensureIndexCompatibility(final Version nodeVersion, Metadata
}
}

/** ensures that the joining node has a version that's compatible with all current nodes*/
/**
* ensures that the joining node has a version that's compatible with all current nodes
*/
public static void ensureNodesCompatibility(final Version joiningNodeVersion, DiscoveryNodes currentNodes) {
final Version minNodeVersion = currentNodes.getMinNodeVersion();
final Version maxNodeVersion = currentNodes.getMaxNodeVersion();
ensureNodesCompatibility(joiningNodeVersion, minNodeVersion, maxNodeVersion);
}

/** ensures that the joining node has a version that's compatible with a given version range */
/**
* ensures that the joining node has a version that's compatible with a given version range
*/
public static void ensureNodesCompatibility(Version joiningNodeVersion, Version minClusterNodeVersion, Version maxClusterNodeVersion) {
assert minClusterNodeVersion.onOrBefore(maxClusterNodeVersion) : minClusterNodeVersion + " > " + maxClusterNodeVersion;
if (joiningNodeVersion.isCompatible(maxClusterNodeVersion) == false) {
Expand Down Expand Up @@ -466,13 +476,34 @@ public static void ensureMajorVersionBarrier(Version joiningNodeVersion, Version
}
}

public static void ensureNodeCommissioned(DiscoveryNode node, Metadata metadata) {
DecommissionAttributeMetadata decommissionAttributeMetadata = metadata.decommissionAttributeMetadata();
if (decommissionAttributeMetadata != null) {
DecommissionAttribute decommissionAttribute = decommissionAttributeMetadata.decommissionAttribute();
DecommissionStatus status = decommissionAttributeMetadata.status();
if (decommissionAttribute != null && status != null) {
// We will let the node join the cluster if the current status is in FAILED state
if (node.getAttributes().get(decommissionAttribute.attributeName()).equals(decommissionAttribute.attributeValue())
&& status.equals(DecommissionStatus.FAILED) == false) {
throw new NodeDecommissionedException(
imRishN marked this conversation as resolved.
Show resolved Hide resolved
"node [{}] has decommissioned attribute [{}] with current status of decommissioning [{}]",
node.toString(),
decommissionAttribute.toString(),
status.status()
);
}
}
}
}

public static Collection<BiConsumer<DiscoveryNode, ClusterState>> addBuiltInJoinValidators(
Collection<BiConsumer<DiscoveryNode, ClusterState>> onJoinValidators
) {
final Collection<BiConsumer<DiscoveryNode, ClusterState>> validators = new ArrayList<>();
validators.add((node, state) -> {
ensureNodesCompatibility(node.getVersion(), state.getNodes());
ensureIndexCompatibility(node.getVersion(), state.getMetadata());
ensureNodeCommissioned(node, state.getMetadata());
});
validators.addAll(onJoinValidators);
return Collections.unmodifiableCollection(validators);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.cluster.decommission;

import org.opensearch.common.io.stream.StreamInput;
import org.opensearch.common.io.stream.StreamOutput;
import org.opensearch.common.io.stream.Writeable;

import java.io.IOException;
import java.util.Objects;

/**
* {@link DecommissionAttribute} encapsulates information about decommissioned node attribute like attribute name, attribute value.
*
* @opensearch.internal
*/
public final class DecommissionAttribute implements Writeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be a generic Attribute class, nothing specific to Decommissioning. Also in the code base today node level attributes are referenced using simple Map<String, String> would that suffice here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this attribute class for simplicity and clarity. This is helping the code become more readable. Also, in future if the user just wants to decommission a set of nodes with a particular attribute, we would be easily able to modify this class. Similarly, there could be multiple such cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel you don't need a new class for it, just need to store key/ value pair.

private final String attributeName;
private final String attributeValue;

/**
* Constructs new decommission attribute name value pair
*
* @param attributeName attribute name
* @param attributeValue attribute value
*/
public DecommissionAttribute(String attributeName, String attributeValue) {
this.attributeName = attributeName;
this.attributeValue = attributeValue;
}

/**
* Returns attribute name
*
* @return attributeName
*/
public String attributeName() {
return this.attributeName;
}

/**
* Returns attribute value
*
* @return attributeValue
*/
public String attributeValue() {
return this.attributeValue;
}

public DecommissionAttribute(StreamInput in) throws IOException {
attributeName = in.readString();
attributeValue = in.readString();
}

/**
* Writes decommission attribute name value to stream output
*
* @param out stream output
*/
@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeString(attributeName);
out.writeString(attributeValue);
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;

DecommissionAttribute that = (DecommissionAttribute) o;

if (!attributeName.equals(that.attributeName)) return false;
return attributeValue.equals(that.attributeValue);
}

@Override
public int hashCode() {
return Objects.hash(attributeName, attributeValue);
}

@Override
public String toString() {
return "DecommissionAttribute{" + "attributeName='" + attributeName + '\'' + ", attributeValue='" + attributeValue + '\'' + '}';
}
}
Loading