Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(regionserver): add graceful shutdown configuration #570

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
4ad793f
feat(regionserver): add graceful shutdown configuration
razvan Oct 2, 2024
cb232df
Make UnifiedRoleConfiguration a sub-trait of Send
razvan Oct 2, 2024
dea179d
Replace trait with enum.
razvan Oct 2, 2024
eecaf23
implement region mover command
razvan Oct 2, 2024
0b14f92
fix: crd field names
razvan Oct 14, 2024
71793ea
unit tests and shell escaping
razvan Oct 14, 2024
1644aff
update docs
razvan Oct 14, 2024
1903f36
spelling
razvan Oct 14, 2024
5e8201f
cargo update
razvan Oct 14, 2024
e76166a
added shutdown test & hbase-entrypoint.sh
razvan Oct 16, 2024
3c63da1
cleanup and set region mover opts env var
razvan Oct 17, 2024
69a6f49
main merge
razvan Oct 17, 2024
8dbde9b
first successful integration test
razvan Oct 17, 2024
68756ab
main merge
razvan Oct 17, 2024
43abf6d
fix image pull policy for the kerberos tests
razvan Oct 17, 2024
2b6e89b
add RUN_REGION_MOVER env var
razvan Oct 17, 2024
c53497a
Update docs/modules/hbase/pages/usage-guide/operations/graceful-shutd…
razvan Oct 17, 2024
4e31a3c
remove trailing whitespace in docs
razvan Oct 17, 2024
a10caa0
rust : remove unused dep
razvan Oct 17, 2024
f42ab05
fix shellcheck lint
razvan Oct 17, 2024
0e9e37e
update shutdown test and run it successfuly
razvan Oct 18, 2024
c2c92c5
update docs
razvan Oct 18, 2024
8d7265e
Update rust/crd/src/lib.rs
razvan Oct 18, 2024
28a1395
fix const arithmetic
razvan Oct 18, 2024
f059e7f
switch to LazyLock
razvan Oct 18, 2024
67f3f1b
configure gracefulShutdownTimeout in (almost) all tests
razvan Oct 18, 2024
7e118ab
region mover args
razvan Oct 21, 2024
34a5ddb
Merge branch 'main' into feat/region-mover
razvan Oct 23, 2024
f9a769b
Update CHANGELOG.md
razvan Oct 23, 2024
420ba36
Update rust/crd/src/lib.rs
razvan Oct 24, 2024
2b0d63b
Update rust/crd/src/lib.rs
razvan Oct 24, 2024
5d5d5e9
Update rust/crd/src/lib.rs
razvan Oct 24, 2024
228ad4f
Update docs/modules/hbase/pages/usage-guide/operations/graceful-shutd…
razvan Oct 24, 2024
039c22a
Update docs/modules/hbase/pages/usage-guide/operations/graceful-shutd…
razvan Oct 24, 2024
60b9dc8
Update docs/modules/hbase/pages/usage-guide/operations/graceful-shutd…
razvan Oct 24, 2024
fd8331e
Update docs/modules/hbase/pages/usage-guide/operations/graceful-shutd…
razvan Oct 24, 2024
5378f11
Update rust/crd/src/lib.rs
razvan Oct 24, 2024
7b08a26
main merge
razvan Oct 25, 2024
6f087db
note on constant paths and the entrypoint script
razvan Oct 25, 2024
0f32e59
remove unnecessary configOverrides
razvan Oct 25, 2024
109e877
wip: use Fragment for the RegionMover
razvan Oct 25, 2024
05f4303
fix crd generation
razvan Oct 25, 2024
19fed55
test: fail if the regionmover fails (only with 2.6)
razvan Oct 28, 2024
8a8d26a
refactor to reduce (some) duplication
razvan Oct 28, 2024
e0aaa27
tests: use dev images
razvan Oct 28, 2024
eb52267
feat: remove hard-coded cluster.local from the domain name
razvan Oct 29, 2024
c051fb5
main merge
razvan Oct 29, 2024
40ae497
Merge branch 'main' into feat/region-mover
razvan Oct 29, 2024
d6d5fe4
fix: RegionMover fields should not be Optional
razvan Oct 30, 2024
fa239e5
main merge
razvan Jan 15, 2025
cb76f4e
add STACKABLE_LOG_DIR env var
razvan Jan 15, 2025
e86b446
ref introduce const CONTAINERDEBUG_LOG_DIRECTORY
razvan Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
- Reduce CRD size from `1.4MB` to `96KB` by accepting arbitrary YAML input instead of the underlying schema for the following fields ([#548]):
- `podOverrides`
- `affinity`
- Optionally move regions to other pods before shutting down a region server ([#570]).
razvan marked this conversation as resolved.
Show resolved Hide resolved

### Fixed

Expand All @@ -21,6 +22,7 @@
[#550]: https://github.com/stackabletech/hbase-operator/pull/550
[#556]: https://github.com/stackabletech/hbase-operator/pull/556
[#558]: https://github.com/stackabletech/hbase-operator/pull/558
[#570]: https://github.com/stackabletech/hbase-operator/pull/570

## [24.7.0] - 2024-07-24

Expand Down
25 changes: 16 additions & 9 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

44 changes: 30 additions & 14 deletions Cargo.nix

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ rstest = "0.22"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
shell-escape = "0.1"
snafu = "0.8"
stackable-operator = { git = "https://github.com/stackabletech/operator-rs.git", tag = "stackable-operator-0.76.0" }
product-config = { git = "https://github.com/stackabletech/product-config.git", tag = "0.7.0" }
Expand Down
52 changes: 52 additions & 0 deletions deploy/helm/hbase-operator/crds/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -724,6 +724,32 @@ spec:
nullable: true
type: boolean
type: object
regionMover:
description: Before terminating a region server pod, the RegionMover tool can be invoked to transfer local regions to other servers. This may cause a lot of network traffic in the Kubernetes cluster if the entire HBase stacklet is being restarted. The operator will compute a timeout period for the region move that will not exceed the graceful shutdown timeout.
nullable: true
properties:
ack:
description: If enabled (default), the region mover will confirm that regions are available on the source as well as the target pods before and after the move.
type: boolean
extraOpts:
default: []
description: Additional options to pass to the region mover.
items:
type: string
type: array
maxThreads:
description: Maximum number of threads to use for moving regions.
format: uint16
minimum: 0.0
type: integer
runBeforeShutdown:
description: Move local regions to other servers before terminating a region server's pod.
type: boolean
required:
- ack
- maxThreads
- runBeforeShutdown
type: object
resources:
default:
cpu:
Expand Down Expand Up @@ -947,6 +973,32 @@ spec:
nullable: true
type: boolean
type: object
regionMover:
description: Before terminating a region server pod, the RegionMover tool can be invoked to transfer local regions to other servers. This may cause a lot of network traffic in the Kubernetes cluster if the entire HBase stacklet is being restarted. The operator will compute a timeout period for the region move that will not exceed the graceful shutdown timeout.
nullable: true
properties:
ack:
description: If enabled (default), the region mover will confirm that regions are available on the source as well as the target pods before and after the move.
type: boolean
extraOpts:
default: []
description: Additional options to pass to the region mover.
items:
type: string
type: array
maxThreads:
description: Maximum number of threads to use for moving regions.
format: uint16
minimum: 0.0
type: integer
runBeforeShutdown:
description: Move local regions to other servers before terminating a region server's pod.
type: boolean
required:
- ack
- maxThreads
- runBeforeShutdown
type: object
resources:
default:
cpu:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Graceful shutdown

You can configure the graceful shutdown as described in xref:concepts:operations/graceful_shutdown.adoc[].
You can configure the graceful shutdown grace period as described in xref:concepts:operations/graceful_shutdown.adoc[].

== Masters

Expand All @@ -15,7 +15,7 @@

== RegionServers

As a default, RegionServers have `60 minutes` to shut down gracefully.
By default, RegionServers have `60 minutes` to shut down gracefully.

They use the same mechanism described above.
In contrast to the Master servers, they will, however, acknowledge the graceful shutdown with a message in the logs:
Expand All @@ -26,6 +26,30 @@
2023-10-11 12:38:05,060 INFO [shutdown-hook-0] regionserver.HRegionServer: ***** STOPPING region server 'test-hbase-regionserver-default-0.test-hbase-regionserver-default.kuttl-test-topical-parakeet.svc.cluster.local,16020,1697027870348' *****
----

The operator allows for finer control over the shutdown process of region servers.
For each region server pod, the region mover tool may be invoked before terminating the region server's pod.
The affected regions are transferred to other pods thus ensuring that the data is not lost.
razvan marked this conversation as resolved.
Show resolved Hide resolved

Here is a an example:

Check notice on line 33 in docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc#L33

Two determiners in a row. Choose either “a” or “an”. (DT_DT[1]) Suggestions: `a`, `an` Rule: https://community.languagetool.org/rule/show/DT_DT?lang=en-US&subId=1 Category: GRAMMAR
Raw output
docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc:33:8: Two determiners in a row. Choose either “a” or “an”. (DT_DT[1])
 Suggestions: `a`, `an`
 Rule: https://community.languagetool.org/rule/show/DT_DT?lang=en-US&subId=1
 Category: GRAMMAR
razvan marked this conversation as resolved.
Show resolved Hide resolved

[source,yaml]
----
spec:
regionServers:
config:
regionMover
razvan marked this conversation as resolved.
Show resolved Hide resolved
runBeforeShutdown: true, <1>
maxThreads: 5, <2>
ack: false, <3>
extraOpts: ["--designatedFile", "/path/to/designatedFile"] <4>

Check notice on line 44 in docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc#L44

Unpaired symbol: ‘"’ seems to be missing (EN_UNPAIRED_QUOTES) URL: https://languagetool.org/insights/post/punctuation-guide/#what-are-parentheses Rule: https://community.languagetool.org/rule/show/EN_UNPAIRED_QUOTES?lang=en-US Category: PUNCTUATION
Raw output
docs/modules/hbase/pages/usage-guide/operations/graceful-shutdown.adoc:44:37: Unpaired symbol: ‘"’ seems to be missing (EN_UNPAIRED_QUOTES)
 URL: https://languagetool.org/insights/post/punctuation-guide/#what-are-parentheses 
 Rule: https://community.languagetool.org/rule/show/EN_UNPAIRED_QUOTES?lang=en-US
 Category: PUNCTUATION
razvan marked this conversation as resolved.
Show resolved Hide resolved
----
<1>: Run the region mover tool before shutting down the region server. Default is `false`.
<2>: Maximum number of threads to use for moving regions. Default is 1.
razvan marked this conversation as resolved.
Show resolved Hide resolved
<3>: Enable or disable region confirmation on the present and target servers. Default is `true`.
<4>: Extra options to pass to the region mover tool.
razvan marked this conversation as resolved.
Show resolved Hide resolved

NOTE: There is no need to explicitly specify a timeout for the region movement. The operator will compute an appropriate timeout that cannot exceed the `gracefulShutdownTimeout` for region servers.

== RestServers

As a default, RestServers have `5 minutes` to shut down gracefully.
Expand Down
1 change: 1 addition & 0 deletions rust/crd/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ publish = false
product-config.workspace = true
serde.workspace = true
serde_json.workspace = true
shell-escape.workspace = true
snafu.workspace = true
stackable-operator.workspace = true
strum.workspace = true
Expand Down
8 changes: 5 additions & 3 deletions rust/crd/src/affinity.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,13 +123,15 @@ mod tests {
replicas: 1
"#;
let hbase: HbaseCluster = serde_yaml::from_str(input).expect("illegal test input");
let merged_config = hbase
let affinity = hbase
.merged_config(
&role,
"default",
&hbase.spec.cluster_config.hdfs_config_map_name,
)
.unwrap();
.unwrap()
NickLarsenNZ marked this conversation as resolved.
Show resolved Hide resolved
.affinity()
.clone();

let mut expected_affinities = vec![WeightedPodAffinityTerm {
pod_affinity_term: PodAffinityTerm {
Expand Down Expand Up @@ -184,7 +186,7 @@ mod tests {
};

assert_eq!(
merged_config.affinity,
affinity,
StackableAffinity {
pod_affinity: Some(PodAffinity {
preferred_during_scheduling_ignored_during_execution: Some(expected_affinities),
Expand Down
Loading
Loading