[FEATURE] Selective V2 Data Engine Activation #7015

derekbit · 2023-11-01T15:00:57Z

Is your improvement request related to a feature? Please describe (👍 if you like this request)

In a large cluster, both powerful nodes and low-spec nodes exist, and there are lots of kinds of applications running inside it. Currently, the v2-data-engine enables the instance-manager pod for v2 volumes on all Longhorn nodes regardless of machines' specs. To address the issue, the v2 data engines can be activated through the global setting v2-dat-engine and per-node labels, annotations, or spec fields.

The ticket can be extended to instance-manager pod for v1 volumes in the future.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cc @shuo-wu @innobead

The text was updated successfully, but these errors were encountered:

longhorn-io-github-bot · 2023-11-13T16:48:53Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:

Create a 3-node LH volume and enable v1-data-engine and v2-data-engine
Create v1 and v2 volumes with 2 replicas
Add label node.longhorn.io/disable-v2-data-engine: "true" to two of kubernetes nodes
IMs and their pods for v1 volumes should not be impacted.
For IMs and their pods for v2 volumes, it there is no replicas and engine in the IM, the IM with the label should be deleted.
Detach all v2 volumes. After the v2 volumes are detached, the IMs and pods with the label should be deleted.
Remove the label, the deleted IMs and pods should recreated.

Does the PR include the explanation for the fix or the feature?
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
The PR is at

longhorn/longhorn-manager#2292

Which areas/issues this PR might have potential impacts on?
Area: v2 volume, instance manager
Issues

chriscchien · 2024-01-04T07:10:41Z

Verified pass on longhorn master(longhorn-manager 970ba4)

Following test steps, did not encounter problem, close this ticket, thank you.

khushboo-rancher · 2024-01-05T21:28:47Z

@chriscchien Some priority 1/2 scenarios that can be further tested:

Prerequisite: Have a v1 volume with three replicas and data checksum computed.

Disable all the node for v2 data engine, create a v2 volume. The volume should show unscheduling, Delete the v2 disable label. Verify the volume become schedulable.
Create a v2 volume, add v2 disable label to one of the replica node. Crash the IM on replica node. Check the replica behavior.
Create a v2 volume, add v2 disable label to one of the volume attached node. Crash the IM on attached node. Check the replica behavior.
Create a v2 volume with 2 replicas. Trigger the replica rebuilding. When the replica is progress, add the v2 disable label to the rebuilding replica node`. Verify that the replica finishes successfully.

v1 volume should not be impacted in any of the above scenarios.

chriscchien · 2024-01-10T07:57:52Z

@chriscchien Some priority 1/2 scenarios that can be further tested:

Prerequisite: Have a v1 volume with three replicas and data checksum computed.

Disable all the node for v2 data engine, create a v2 volume. The volume should show unscheduling, Delete the v2 disable label. Verify the volume become schedulable.

Create a v2 volume, add v2 disable label to one of the replica node. Crash the IM on replica node. Check the replica behavior.

Create a v2 volume, add v2 disable label to one of the volume attached node. Crash the IM on attached node. Check the replica behavior.

Create a v2 volume with 2 replicas. Trigger the replica rebuilding. When the replica is progress, add the v2 disable label to the rebuilding replica node`. Verify that the replica finishes successfully.

v1 volume should not be impacted in any of the above scenarios.

Hi @khushboo-rancher,

v2 volume become schedulable after remove the label or set label to false
The replica on crashed IM gone, volume become degraded, after remove label or set label to false then do offline rebuilding, all replicas ready and data intact.
The replica on crashed IM gone, volume try to reattach to the same node but stuck at attaching, after remove label or set label to false, volume attached, data intact.
Following steps, the replica rebuild success.

And after above tests complete, v1 volume healthy, data intact

innobead · 2024-01-10T09:45:11Z

@chriscchien let's automate these cases it they haven't been implemented. Create a ticket for it.

derekbit added this to the v1.6.0 milestone Nov 1, 2023

derekbit added the area/v2-data-engine v2 data engine (SPDK) label Nov 1, 2023

derekbit self-assigned this Nov 1, 2023

github-actions bot mentioned this issue Nov 1, 2023

[TEST][IMPROVEMENT] Support instance-manager pod for v2 volumes on selected nodes #7016

Open

derekbit mentioned this issue Nov 13, 2023

Support instance-manager pod for v2 volumes on selected nodes longhorn/longhorn-manager#2292

Merged

innobead added the highlight Important feature/issue to highlight label Jan 3, 2024

innobead changed the title ~~[IMPROVEMENT] Support instance-manager pod for v2 volumes on selected nodes~~ [FEATURE] Support instance-manager pod for v2 volumes on selected nodes Jan 3, 2024

innobead added kind/feature Feature request, new feature and removed kind/improvement Request for improvement of existing function labels Jan 3, 2024

innobead mentioned this issue Jan 3, 2024

[FEATURE] Support instance-manager pod for v1 volumes on selected nodes #7526

Open

chriscchien self-assigned this Jan 4, 2024

innobead assigned roger-ryao Jan 4, 2024

chriscchien closed this as completed Jan 4, 2024

derekbit mentioned this issue Jan 14, 2024

v2 volume: add "Selective V2 Data Engine Activation" doc longhorn/website#845

Merged

derekbit changed the title ~~[FEATURE] Support instance-manager pod for v2 volumes on selected nodes~~ [FEATURE] Selective V2 Data Engine Activation Jan 14, 2024

derekbit added this to Longhorn Sprint Aug 3, 2024

derekbit moved this to Closed in Longhorn Sprint Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Selective V2 Data Engine Activation #7015

[FEATURE] Selective V2 Data Engine Activation #7015

derekbit commented Nov 1, 2023

longhorn-io-github-bot commented Nov 13, 2023 •

edited by derekbit

Loading

chriscchien commented Jan 4, 2024

khushboo-rancher commented Jan 5, 2024

chriscchien commented Jan 10, 2024

innobead commented Jan 10, 2024

[FEATURE] Selective V2 Data Engine Activation #7015

[FEATURE] Selective V2 Data Engine Activation #7015

Comments

derekbit commented Nov 1, 2023

Is your improvement request related to a feature? Please describe (👍 if you like this request)

Describe the solution you'd like

Describe alternatives you've considered

Additional context

longhorn-io-github-bot commented Nov 13, 2023 • edited by derekbit Loading

Pre Ready-For-Testing Checklist

chriscchien commented Jan 4, 2024

khushboo-rancher commented Jan 5, 2024

chriscchien commented Jan 10, 2024

innobead commented Jan 10, 2024

longhorn-io-github-bot commented Nov 13, 2023 •

edited by derekbit

Loading