Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/features/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@
- [Segment Warmer](opensearch/segment-warmer.md)
- [Semantic Version Field Type](opensearch/semantic-version-field-type.md)
- [Settings Management](opensearch/settings-management.md)
- [Shard Allocation](opensearch/shard-allocation.md)
- [Skip List](opensearch/skip-list.md)
- [Snapshot Restore Enhancements](opensearch/snapshot-restore-enhancements.md)
- [Star Tree Index](opensearch/star-tree-index.md)
Expand Down
118 changes: 118 additions & 0 deletions docs/features/opensearch/shard-allocation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Shard Allocation

## Summary

Shard allocation in OpenSearch determines how shards are distributed across cluster nodes. The `BalancedShardsAllocator` uses a `WeightFunction` to calculate optimal node weights based on configurable balance factors. This feature includes settings for primary shard balancing, which helps distribute primary shards evenly across nodes for better performance and fault tolerance.

## Details

### Architecture

```mermaid
graph TB
subgraph "Cluster Manager"
CS[Cluster Settings] --> BSA[BalancedShardsAllocator]
BSA --> WF[WeightFunction]
WF --> AC[AllocationConstraints]
WF --> RC[RebalanceConstraints]
end

subgraph "Constraint Types"
AC --> ISPN[INDEX_SHARD_PER_NODE_BREACH]
AC --> IPSB[INDEX_PRIMARY_SHARD_BALANCE]
AC --> CPSB[CLUSTER_PRIMARY_SHARD_BALANCE]
RC --> IPSB2[INDEX_PRIMARY_SHARD_BALANCE]
RC --> CPSR[CLUSTER_PRIMARY_SHARD_REBALANCE]
end

subgraph "Allocation Decision"
WF --> |weight calculation| AD[Allocation Decision]
AD --> N1[Node 1]
AD --> N2[Node 2]
AD --> N3[Node N]
end
```

### Weight Calculation

The `WeightFunction` calculates node weights using the formula:

```
weight(node, index) = θ₀ × (node.numShards - avgShardsPerNode)
+ θ₁ × (node.numShards(index) - avgShardsPerNode(index))
```

Where:
- `θ₀ = shardBalance / (indexBalance + shardBalance)`
- `θ₁ = indexBalance / (indexBalance + shardBalance)`

### Components

| Component | Description |
|-----------|-------------|
| `BalancedShardsAllocator` | Main allocator that orchestrates shard distribution |
| `WeightFunction` | Calculates node weights for allocation decisions |
| `AllocationConstraints` | Constraints applied during initial shard allocation |
| `RebalanceConstraints` | Constraints applied during shard rebalancing |
| `LocalShardsBalancer` | Performs actual allocation and rebalancing operations |

### Configuration

| Setting | Description | Default |
|---------|-------------|---------|
| `cluster.routing.allocation.balance.shard` | Weight factor for total shards per node | `0.45` |
| `cluster.routing.allocation.balance.index` | Weight factor for shards per index per node | `0.55` |
| `cluster.routing.allocation.balance.threshold` | Minimum optimization value for operations | `1.0` |
| `cluster.routing.allocation.balance.prefer_primary` | Enable primary shard balancing | `false` |
| `cluster.routing.allocation.rebalance.primary.enable` | Enable primary shard rebalancing | `false` |
| `cluster.routing.allocation.rebalance.primary.buffer` | Buffer for primary shard rebalancing | `0.10` |
| `cluster.routing.allocation.primary_constraint.threshold` | Threshold for primary constraint | `10` |

### Usage Example

Enable primary shard balancing for segment replication workloads:

```json
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.balance.prefer_primary": true,
"cluster.routing.allocation.rebalance.primary.enable": true,
"cluster.routing.allocation.rebalance.primary.buffer": 0.10
}
}
```

Adjust balance factors for specific workloads:

```json
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.balance.shard": 0.50,
"cluster.routing.allocation.balance.index": 0.50
}
}
```

## Limitations

- Primary shard balancing is best-effort and may not achieve perfect distribution in all scenarios
- Enabling primary shard balance does not guarantee equal primary shards on each node, especially during failover
- Changing `prefer_primary` to `false` after enabling does not trigger redistribution

## Related PRs

| Version | PR | Description |
|---------|-----|-------------|
| v3.4.0 | [#19012](https://github.com/opensearch-project/OpenSearch/pull/19012) | Fix WeightFunction constraint reset bug |

## References

- [Issue #13429](https://github.com/opensearch-project/OpenSearch/issues/13429): Bug report for constraint reset issue
- [Cluster Settings Documentation](https://docs.opensearch.org/3.0/install-and-configure/configuring-opensearch/cluster-settings/): Official cluster routing allocation settings
- [Segment Replication Documentation](https://docs.opensearch.org/3.0/tuning-your-cluster/availability-and-recovery/segment-replication/index/): Recommended settings for segment replication

## Change History

- **v3.4.0** (2025-10-10): Fixed bug where allocation and rebalance constraints were incorrectly reset when updating balance factors
105 changes: 105 additions & 0 deletions docs/releases/v3.4.0/features/opensearch/shard-allocation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Shard Allocation

## Summary

This release fixes a bug where the `WeightFunction` allocation and rebalance constraints for primary shard balancing were incorrectly reset to default values when updating certain cluster settings. The fix ensures that primary shard balance settings (`cluster.routing.allocation.balance.prefer_primary` and `cluster.routing.allocation.rebalance.primary.enable`) remain effective even when other balance-related settings are modified.

## Details

### What's New in v3.4.0

The `BalancedShardsAllocator` uses a `WeightFunction` to calculate node weights for shard allocation decisions. This function includes constraints that control primary shard balancing behavior. Prior to this fix, updating settings like `indexBalanceFactor`, `shardBalanceFactor`, or `preferPrimaryShardRebalanceBuffer` would create a new `WeightFunction` instance that lost the previously configured primary shard balance constraints.

### Technical Changes

#### Root Cause

The bug occurred because:

1. Settings like `PREFER_PRIMARY_SHARD_BALANCE` and `PREFER_PRIMARY_SHARD_REBALANCE` updated constraints on the existing `WeightFunction` instance
2. Settings like `INDEX_BALANCE_FACTOR_SETTING`, `SHARD_BALANCE_FACTOR_SETTING`, and `PRIMARY_SHARD_REBALANCE_BUFFER` triggered `updateWeightFunction()` which created a new `WeightFunction`
3. The new `WeightFunction` was constructed without the current primary balance constraint states

```mermaid
graph TB
subgraph "Before Fix"
A[Set prefer_primary=true] --> B[Update weightFunction constraints]
C[Update shard_balance_factor] --> D[Create NEW weightFunction]
D --> E[Constraints reset to defaults]
end

subgraph "After Fix"
F[Set prefer_primary=true] --> G[Store in instance variable]
H[Update shard_balance_factor] --> I[Create NEW weightFunction]
I --> J[Pass stored constraint values]
J --> K[Constraints preserved]
end
```

#### Code Changes

The fix modifies the `WeightFunction` constructor to accept the current constraint states:

| Component | Change |
|-----------|--------|
| `WeightFunction` constructor | Added `preferPrimaryShardBalance` and `preferPrimaryShardRebalance` parameters |
| `updateWeightFunction()` | Now passes current constraint values to new `WeightFunction` |
| `WeightFunction` initialization | Applies constraint settings during construction |

#### Modified Files

| File | Description |
|------|-------------|
| `BalancedShardsAllocator.java` | Extended `WeightFunction` constructor and `updateWeightFunction()` |
| `SegmentReplicationAllocationIT.java` | Added integration test to verify fix |
| `BalanceConfigurationTests.java` | Added unit test for settings update scenario |
| `OpenSearchAllocationTestCase.java` | Added test helper method |

### Usage Example

The bug manifested when settings were updated in a specific order:

```json
// Step 1: Enable primary shard balance
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.balance.prefer_primary": true
}
}

// Step 2: Update shard balance factor (this previously reset prefer_primary)
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.balance.shard": 0.5
}
}

// After fix: prefer_primary constraint remains active
```

### Migration Notes

No migration required. This is a bug fix that ensures existing settings work as documented.

## Limitations

- The fix only addresses the constraint reset issue; it does not change the fundamental behavior of primary shard balancing
- Primary shard balance is still a best-effort optimization and may not achieve perfect balance in all scenarios

## Related PRs

| PR | Description |
|----|-------------|
| [#19012](https://github.com/opensearch-project/OpenSearch/pull/19012) | Fix Allocation and Rebalance Constraints of WeightFunction are incorrectly reset |

## References

- [Issue #13429](https://github.com/opensearch-project/OpenSearch/issues/13429): Original bug report
- [Cluster Settings Documentation](https://docs.opensearch.org/3.0/install-and-configure/configuring-opensearch/cluster-settings/): Official cluster routing allocation settings
- [Segment Replication Documentation](https://docs.opensearch.org/3.0/tuning-your-cluster/availability-and-recovery/segment-replication/index/): Recommended settings for segment replication

## Related Feature Report

- [Full feature documentation](../../../../features/opensearch/shard-allocation.md)
1 change: 1 addition & 0 deletions docs/releases/v3.4.0/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
- [Pull-based Ingestion Bugfixes](features/opensearch/pull-based-ingestion-bugfixes.md) - Fix out-of-bounds offset handling and remove persisted pointers for at-least-once guarantees
- [Query Bugfixes](features/opensearch/query-bugfixes.md) - Fix crashes in wildcard queries, aggregations, highlighters, and script score queries
- [Reactor Netty Transport](features/opensearch/reactor-netty-transport.md) - Fix HTTP channel tracking and release during node shutdown
- [Shard Allocation](features/opensearch/shard-allocation.md) - Fix WeightFunction constraint reset when updating balance factors
- [Shard & Segment Bugfixes](features/opensearch/shard-segment-bugfixes.md) - Fix merged segment warmer exceptions, ClusterService state assertion, and EngineConfig builder
- [Snapshot & Restore Bugfixes](features/opensearch/snapshot-restore-bugfixes.md) - Fix NullPointerException when restoring remote snapshot with missing shard size information

Expand Down