-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create: fails directly when the StorageObject is already exist with the same NAME #243
Conversation
The 2 issues will be: Create 2 blocks with the same name "block0" and in different gluster volumes, "dht" and "dht1", there will be 2 Targets with only one StorageObject:
In the /etc/target/saveconfig.json we can see the Target's gbid is not the same with StorageObject's:
|
@lxbsz splitting create into two phases will give some flexibility! I agree! But this effects the overall time taken to create the block volume, especially at scale. Hence I'm bit reluctant with this approach. Why cannot we solve it like create target and SO in one phase. Delete target and SO if reply->exit !=0 ? Thanks! |
Do you mean the cache loading in the targetcli/rtslib_fb when running the create operations ? If so, I think this will be almost the same, because only when creating/deleting the StorageObject it will do the bs_cache loading. And for the others will won't use this bs_cache.
This is mainly to resolve the issue 2 above, or the code will be very complicated. Thanks. |
@pkalever |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxbsz please change the commit msgs as well, more comments inline. Thanks!
f5b03e9
to
9a63ca0
Compare
e82babf
to
858dacf
Compare
@pkalever |
This will be one problem if we create 2 block targets by using the same NAME in 2 different volumes. The second time it will fail and won't be loaded and no any info in the saveconfig.json, but it will still could find it which is the StorageObject created by the first creation, the the rollback will fail like: IQN: - PORTAL(S): - ROLLBACK FAILED ON: 192.168.195.164 RESULT: FAIL But the rollback is actually successful. After this fixing it will be like: failed to configure on 192.168.195.164 create StorageObject failed RESULT:FAIL Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>
The old code has covered the raw errors from targetcli command, but some of the raw errors will be very useful to get the root cause of some bugs. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>
This will avoid something like the following odd case: LOG("mgmt", GB_LOG_INFO, "tmp=%s", tmp); It will always give us: INFO: tmp=mgmt [at block_svc_routines.c+4394 :<block_create_common>] This is because the tmp dups to the "char *tmp" in the LOG macro. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>
d8d1108
to
8c7c6f8
Compare
@lxbsz tested this and works as expected.
Two observations:
Hint: block with name 'block-volume1' already exist (may be hosted on a different block-hosting volume)
Thanks! |
Not sure whether is the case that only 2/3 nodes have the SO for some reason, like user have 4 nodes, users are trying to create hosting-volume1/block1 on node-1, node-2 and node-3, then create hosting-volume2/block1 on node-2, node-3 and node-4 ? So the current error should be more in detail. Will it make sense ?
Yeah, this make sense. |
@lxbsz I'm okay to print the same msg from HA number of gluster-blockd's. I want you to fix the rpc/block_svc_routines.c line 4389 as:
Thanks! |
Sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lxbsz here is question before I start testing it,
What should I expect when I have 4 nodes, out of which N1, N2, N3 nodes has a block-volume name used, but I have supplied N1, N2, N4 nodes for creating block volume with same name ? Does the cleanup happen on all the 3 nodes N1, N2 & N4 ? or only on N4 ?
Cleaning up on only N4 is optimal though.
0ffe26b
to
5a2100f
Compare
Updated it and fallback to the old version. Jut one NOTE: Currently, I can still hit the old issue: #204, which due to the meta data couldn't be flushed to the disk in time, especially in muti nodes case, like: rhel3 --> 192.168.195.164 I will create the block14 BV in the rep volume with node rhel3 only, then create the block14 BV in the rep1 volume with nodes rhel3 and rhel1 The following are the normal output as expected, when create the block14 BV more than once, it will fail with exist error on node rhel3, and on node rhel1 it will create/delete success:
But sometimes I can hit the following output in my setup:
This is because in block_create_cli_1_svc_st()--> glusterBlockAuditRequest() --> glusterBlockCleanUp() --> glusterBlockDeleteRemoteAsync() , in Line#1287, it will read the meta info when the Node 192.168.195.162's newest status hasn't be flushed to the disk, so cleanupsuccess != info->nhosts, then glusterBlockDeleteRemoteAsync() will return -1. If I just wait for 2 seconds before Line#1287, this issue is disappeared. Then the rep1/block-meta/block14 won't be deleted, the code:
Just after this, I can see that the rep1/block-meta/block14:
The Node 192.168.195.162's status is already in GB_CLEANUP_SUCCESS = "CLEANUPSUCCESS". Maybe this can be reproduced only in my local VM setups which is lacking of enough disk space. Thanks. |
This will fix the following issues: 1, When the StorageObject creation failed the iqn will still be created and won't be deleted in case of: When creating BVs with the same NAME, the second time it will fail duing to the targetcli cache db only allows one [user/$NAME] exist, and then fails it leaving the IQN not deleted correctly. That means there will be two different IQNs both mapped to the same StorageObject. 2, For the case above we will also find that after the second creation failing in the /etc/target/saveconfig.json there will be one StorageObject with the two Targets. In theory, there should be one StorageObject with only 1 Target. This patch will check all the StorageObjects in saveconfig.json before creating, if there already has one StorageObject with the same NAME requested, it will fail directly. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>
Merged now. Thanks @lxbsz . |
What does this PR achieve? Why do we need it?
create: split the creation into 2 phases
This will fix the following issues:
1, When the StorageObject creation failed the iqn will still be
created and won't be deleted in case of:
When creating BV with the same NAME, the second time it will fail
due to the targetcli cache db only allows one [user/$NAME] exist,
but will leave the iqn not deleted correctly.
That means there will be two different Targets will both mapped
to the same StorageObject.
2, For the case above we will also find that after the second
creation failing the /etc/target/saveconfig.json will be one
StorageObject with the second Target in pairs and exist.
In theory, there should be one StorageObject with 2 Targets.
This patch will split the creation into 2 phases: CREATE_SO_SRV
and CREATE_TG_SRV, since in the targetcli it will build the cache
by using the key = [usr/$NAME], so only the StorageObject is
successfully created the second phase will make sense.
Does this PR fix issues?
Fixes: BZ#1725009