-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logic in master service to optimize performance and retain detailed logging for critical cluster operations. #16421
Add logic in master service to optimize performance and retain detailed logging for critical cluster operations. #16421
Conversation
d8d10ef
to
348d542
Compare
❕ Gradle check result for 348d542: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16421 +/- ##
============================================
+ Coverage 72.03% 72.04% +0.01%
- Complexity 65003 65026 +23
============================================
Files 5313 5313
Lines 303375 303397 +22
Branches 43902 43902
============================================
+ Hits 218544 218593 +49
+ Misses 66915 66857 -58
- Partials 17916 17947 +31 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks can we add a test for node-left
logs please
server/src/main/java/org/opensearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/service/MasterService.java
Outdated
Show resolved
Hide resolved
Node Join [2024-10-25T12:51:37,680][INFO ][o.o.c.s.MasterService ] [data1] Tasks batched with key: org.opensearch.cluster.coordination.JoinHelper, count:1 and sample tasks: node-join[{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true} join existing leader], term: 2, version: 7, delta: added {{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true}} Node Left [2024-10-25T12:52:05,089][INFO ][o.o.c.s.MasterService ] [data1] Tasks batched with key: org.opensearch.cluster.coordination.NodeRemovalClusterStateTaskExecutor@78a30062, count:1 and sample tasks: node-left[{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true} reason: disconnected], term: 2, version: 8, delta: removed {{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true}} |
348d542
to
e3b7b02
Compare
…ed logging for critical cluster operations. Signed-off-by: Sumit Bansal <[email protected]>
e3b7b02
to
15d1ff3
Compare
|
Ignore previous comment, updated description. |
Signed-off-by: shwetathareja <[email protected]>
❕ Gradle check result for cbaac5d: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
6f1b59e
into
opensearch-project:main
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-16421-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6f1b59e54bec41d40772f8571c7b65d4b523f8b1
# Push it to GitHub
git push --set-upstream origin backport/backport-16421-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x Then, create a pull request where the |
…ed logging for critical cluster operations. (opensearch-project#16421) Signed-off-by: Sumit Bansal <[email protected]>
Raised backport PR #16493 |
…ed logging for critical cluster operations. (#16421) (#16493) Signed-off-by: Sumit Bansal <[email protected]>
…ed logging for critical cluster operations. (opensearch-project#16421) Signed-off-by: Sumit Bansal <[email protected]> Signed-off-by: shwetathareja <[email protected]> Co-authored-by: shwetathareja <[email protected]>
Description
Add logic in master service to optimize performance and retain detailed logging for critical cluster operations.
Related Issues
Resolves #14795 (review)
Testing
Modified the check for generating short summary from 1000 to 1 in local environment to test the logging.
[2024-10-22T13:44:32,280][DEBUG][o.o.c.s.MasterService ] [runTask-0] took [0s] to notify listeners on successful publication of cluster state (version: 6, uuid: _4hvlyf0RA2ulV2jtjIsKg) for [Tasks batched with key: org.opensearch.cluster.action.shard.ShardStateAction, count:2 and sample tasks: shard-started StartedShardEntry{shardId [[sample-index2][1]], allocationId [wZdnPxXmTQ-87FSRVNA8CQ], primary term [1], message [master {runTask-0}{d12XZzhcSiG1aGyfi-eUpw}{s36FgDsCS6Git2YUlAXRzw}{127.0.0.1}{127.0.0.1:9300}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}[StartedShardEntry{shardId [[sample-index2][1]], allocationId [wZdnPxXmTQ-87FSRVNA8CQ], primary term [1], message [master {runTask-0}{d12XZzhcSiG1aGyfi-eUpw}{s36FgDsCS6Git2YUlAXRzw}{127.0.0.1}{127.0.0.1:9300}{dimr}{testattr=test, shard_indexing_pressure_enabled=true} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]}], shard-started StartedShardEntry{shardId [[sample-index2][1]], allocationId [wZdnPxXmTQ-87FSRVNA8CQ], primary term [1], message [after new shard recovery]}[StartedShardEntry{shardId [[sample-index2][1]], allocationId [wZdnPxXmTQ-87FSRVNA8CQ], primary term [1], message [after new shard recovery]}]]
[2024-10-22T13:44:32,057][DEBUG][o.o.c.s.MasterService ] [runTask-0] took [0s] to notify listeners on successful publication of cluster state (version: 4, uuid: FeOWukKcR5qoPMRZMIlqKA) for [Tasks batched with key: org.opensearch.cluster.metadata.MetadataCreateIndexService, count:1 and sample tasks: create-index [sample-index2], cause [api]]
[2024-10-22T13:44:32,282][DEBUG][o.o.c.s.MasterService ] [runTask-0] took [2ms] to compute cluster state update for [Tasks batched with key: org.opensearch.cluster.routing.BatchedRerouteService, count:1 and sample tasks: cluster_reroute(reroute after starting shards)]
Node Join
[2024-10-25T12:51:37,680][INFO ][o.o.c.s.MasterService ] [data1] Tasks batched with key: org.opensearch.cluster.coordination.JoinHelper, count:1 and sample tasks: node-join[{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true} join existing leader], term: 2, version: 7, delta: added {{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true}}
Node Left
[2024-10-25T12:52:05,089][INFO ][o.o.c.s.MasterService ] [data1] Tasks batched with key: org.opensearch.cluster.coordination.NodeRemovalClusterStateTaskExecutor@78a30062, count:1 and sample tasks: node-left[{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true} reason: disconnected], term: 2, version: 8, delta: removed {{data1}{bUKIJ8aDR6yJftPHtNrMVg}{-chi7PdPRjS2TwqxUnTkLQ}{127.0.0.1}{127.0.0.1:9301}{dmr}{shard_indexing_pressure_enabled=true}}
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.