create: fails directly when the StorageObject is already exist with the same NAME #243

lxbsz · 2019-07-01T06:04:58Z

What does this PR achieve? Why do we need it?

create: split the creation into 2 phases

This will fix the following issues:

1, When the StorageObject creation failed the iqn will still be
created and won't be deleted in case of:

When creating BV with the same NAME, the second time it will fail
due to the targetcli cache db only allows one [user/$NAME] exist,
but will leave the iqn not deleted correctly.

That means there will be two different Targets will both mapped
to the same StorageObject.

2, For the case above we will also find that after the second
creation failing the /etc/target/saveconfig.json will be one
StorageObject with the second Target in pairs and exist.

In theory, there should be one StorageObject with 2 Targets.

This patch will split the creation into 2 phases: CREATE_SO_SRV
and CREATE_TG_SRV, since in the targetcli it will build the cache
by using the key = [usr/$NAME], so only the StorageObject is
successfully created the second phase will make sense.

Does this PR fix issues?

Fixes: BZ#1725009

lxbsz · 2019-07-02T02:29:25Z

The 2 issues will be:

Create 2 blocks with the same name "block0" and in different gluster volumes, "dht" and "dht1", there will be 2 Targets with only one StorageObject:

]# gluster-block create dht1/block0 ha 1 192.168.195.162 21M; gluster-block create dht/block0 ha 1 192.168.195.162 21M; targetcli ls
IQN: iqn.2016-12.org.gluster-block:12d49c06-78d0-447b-974e-6bd37ee15507
PORTAL(S):  192.168.195.162:3260
RESULT: SUCCESS
IQN: -
PORTAL(S): -
ROLLBACK FAILED ON: 192.168.195.162 
RESULT: FAIL
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 0]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  | o- user:glfs .............................................................................................. [Storage Objects: 1]
  |   o- block0 ........................ [[email protected]/block-store/12d49c06-78d0-447b-974e-6bd37ee15507 (21.0MiB) activated]
  |     o- alua ................................................................................................... [ALUA Groups: 3]
  |       o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  |       o- glfs_tg_pt_gp_ano .................................................................. [ALUA state: Active/non-optimized]
  |       o- glfs_tg_pt_gp_ao ....................................................................... [ALUA state: Active/optimized]
  o- iscsi ............................................................................................................ [Targets: 2]
  | o- iqn.2016-12.org.gluster-block:12d49c06-78d0-447b-974e-6bd37ee15507 ................................................ [TPGs: 1]
  | | o- tpg1 .................................................................................................. [gen-acls, no-auth]
  | |   o- acls .......................................................................................................... [ACLs: 0]
  | |   o- luns .......................................................................................................... [LUNs: 1]
  | |   | o- lun0 ................................................................................. [user/block0 (glfs_tg_pt_gp_ao)]
  | |   o- portals .................................................................................................... [Portals: 1]
  | |     o- 192.168.195.162:3260 ............................................................................................. [OK]
  | o- iqn.2016-12.org.gluster-block:42b295bc-cdf0-457c-adda-fd1b3839575a ................................................ [TPGs: 1]
  |   o- tpg1 .................................................................................................. [gen-acls, no-auth]
  |     o- acls .......................................................................................................... [ACLs: 0]
  |     o- luns .......................................................................................................... [LUNs: 1]
  |     | o- lun0 ................................................................................. [user/block0 (glfs_tg_pt_gp_ao)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 192.168.195.162:3260 ............................................................................................. [OK]
  o- loopback ......................................................................................................... [Targets: 0]
  o- vhost ............................................................................................................ [Targets: 0]
  o- xen-pvscsi ....................................................................................................... [Targets: 0]
]#

In the /etc/target/saveconfig.json we can see the Target's gbid is not the same with StorageObject's:

]# cat /etc/target/saveconfig.json 
{
  "storage_objects": [
    {
      "alua_tpgs": [
        {
          "alua_access_state": 1, 
          "alua_access_status": 2, 
          "alua_access_type": 1, 
          "alua_support_active_nonoptimized": 1, 
          "alua_support_active_optimized": 1, 
          "alua_support_offline": 1, 
          "alua_support_standby": 1, 
          "alua_support_transitioning": 1, 
          "alua_support_unavailable": 1, 
          "alua_write_metadata": 0, 
          "implicit_trans_secs": 0, 
          "name": "glfs_tg_pt_gp_ano", 
          "nonop_delay_msecs": 100, 
          "preferred": 0, 
          "tg_pt_gp_id": 2, 
          "trans_delay_msecs": 0
        }, 
        {
          "alua_access_state": 0, 
          "alua_access_status": 0, 
          "alua_access_type": 1, 
          "alua_support_active_nonoptimized": 1, 
          "alua_support_active_optimized": 1, 
          "alua_support_offline": 1, 
          "alua_support_standby": 1, 
          "alua_support_transitioning": 1, 
          "alua_support_unavailable": 1, 
          "alua_write_metadata": 0, 
          "implicit_trans_secs": 0, 
          "name": "glfs_tg_pt_gp_ao", 
          "nonop_delay_msecs": 100, 
          "preferred": 0, 
          "tg_pt_gp_id": 1, 
          "trans_delay_msecs": 0
        }, 
        {
          "alua_access_state": 0, 
          "alua_access_status": 0, 
          "alua_access_type": 3, 
          "alua_support_active_nonoptimized": 1, 
          "alua_support_active_optimized": 1, 
          "alua_support_offline": 1, 
          "alua_support_standby": 1, 
          "alua_support_transitioning": 1, 
          "alua_support_unavailable": 1, 
          "alua_write_metadata": 0, 
          "implicit_trans_secs": 0, 
          "name": "default_tg_pt_gp", 
          "nonop_delay_msecs": 100, 
          "preferred": 0, 
          "tg_pt_gp_id": 0, 
          "trans_delay_msecs": 0
        }
      ], 
      "attributes": {
        "cmd_time_out": 130, 
        "dev_size": 22020096, 
        "qfull_time_out": -1
      }, 
      "config": "glfs/[email protected]/block-store/12d49c06-78d0-447b-974e-6bd37ee15507", 
      "control": "max_data_area_mb=8,hw_block_size=512", 
      "hw_max_sectors": 128, 
      "name": "block0", 
      "plugin": "user", 
      "size": 22020096, 
      "wwn": "12d49c06-78d0-447b-974e-6bd37ee15507"
    }
  ], 
  "targets": [
    {
      "fabric": "iscsi", 
      "tpgs": [
        {
          "attributes": {
            "authentication": 0, 
            "cache_dynamic_acls": 1, 
            "default_cmdsn_depth": 64, 
            "default_erl": 0, 
            "demo_mode_discovery": 1, 
            "demo_mode_write_protect": 0, 
            "fabric_prot_type": 0, 
            "generate_node_acls": 1, 
            "login_timeout": 15, 
            "netif_timeout": 2, 
            "prod_mode_write_protect": 0, 
            "t10_pi": 0, 
            "tpg_enabled_sendtargets": 1
          }, 
          "enable": true, 
          "luns": [
            {
              "alias": "81f9fa86eb", 
              "alua_tg_pt_gp_name": "glfs_tg_pt_gp_ao", 
              "index": 0, 
              "storage_object": "/backstores/user/block0"
            }
          ], 
          "node_acls": [], 
          "parameters": {
            "AuthMethod": "CHAP,None", 
            "DataDigest": "CRC32C,None", 
            "DataPDUInOrder": "Yes", 
            "DataSequenceInOrder": "Yes", 
            "DefaultTime2Retain": "20", 
            "DefaultTime2Wait": "2", 
            "ErrorRecoveryLevel": "0", 
            "FirstBurstLength": "65536", 
            "HeaderDigest": "CRC32C,None", 
            "IFMarkInt": "Reject", 
            "IFMarker": "No", 
            "ImmediateData": "Yes", 
            "InitialR2T": "Yes", 
            "MaxBurstLength": "262144", 
            "MaxConnections": "1", 
            "MaxOutstandingR2T": "1", 
            "MaxRecvDataSegmentLength": "8192", 
            "MaxXmitDataSegmentLength": "262144", 
            "OFMarkInt": "Reject", 
            "OFMarker": "No", 
            "TargetAlias": "LIO Target"
          }, 
          "portals": [
            {
              "ip_address": "192.168.195.162", 
              "iser": false, 
              "offload": false, 
              "port": 3260
            }
          ], 
          "tag": 1
        }
      ], 
      "wwn": "iqn.2016-12.org.gluster-block:42b295bc-cdf0-457c-adda-fd1b3839575a"
    }
  ]
}
]#

pkalever · 2019-07-02T09:58:41Z

@lxbsz splitting create into two phases will give some flexibility! I agree!

But this effects the overall time taken to create the block volume, especially at scale.

Hence I'm bit reluctant with this approach.

Why cannot we solve it like create target and SO in one phase. Delete target and SO if reply->exit !=0 ?

Thanks!

lxbsz · 2019-07-02T10:14:21Z

@lxbsz splitting create into two phases will give some flexibility! I agree!

But this effects the overall time taken to create the block volume, especially at scale.

Do you mean the cache loading in the targetcli/rtslib_fb when running the create operations ?

If so, I think this will be almost the same, because only when creating/deleting the StorageObject it will do the bs_cache loading. And for the others will won't use this bs_cache.

]# vim rtslib-fb/rtslib/tcm.py
[...]
 955 bs_cache = {}
 956   
 957 class _Backstore(CFSNode):
 958     """
 959     Backstore is needed as a level in the configfs hierarchy, but otherwise useless.
 960     1:1 so:backstore.
 961     Created by storageobject ctor before SO configfs entry.
 962     """
 963   
 964     def __init__(self, name, storage_object_cls, mode, index=None):
 965         super(_Backstore, self).__init__()  
 966         self._so_cls = storage_object_cls   
 967         self._plugin = bs_params[self._so_cls]['name']
 968   
 969         dirp = bs_params[self._so_cls].get("alt_dirprefix", self._plugin)
 970   
 971         # if the caller knows the index then skip the cache
 972         global bs_cache       
 973         if index is None and not bs_cache:  
 974             for dir in glob.iglob("%s/core/*_*/*/" % self.configfs_dir):
 975                 parts = dir.split("/")              
 976                 bs_name = parts[-2]
 977                 bs_dirp, bs_index = parts[-3].rsplit("_", 1) 
 978                 current_key = "%s/%s" % (bs_dirp, bs_name)
 979                 bs_cache[current_key] = int(bs_index)
 980   
[...]
1012         except:
1013             if self._lookup_key in bs_cache:
1014                 del bs_cache[self._lookup_key]
1015             raise
1016 
1017     def delete(self):
1018         super(_Backstore, self).delete()
1019         if self._lookup_key in bs_cache:
1020             del bs_cache[self._lookup_key]
[...]

Hence I'm bit reluctant with this approach.

Why cannot we solve it like create target and SO in one phase. Delete target and SO if reply->exit !=0 ?

This is mainly to resolve the issue 2 above, or the code will be very complicated.

Thanks.
BRs

lxbsz · 2019-07-08T10:28:21Z

@pkalever
Updated done.
Thanks.

pkalever

@lxbsz please change the commit msgs as well, more comments inline. Thanks!

rpc/block_svc_routines.c

lxbsz · 2019-07-12T09:23:59Z

@pkalever
Updated it.
Thanks.

rpc/block_svc_routines.c

This will be one problem if we create 2 block targets by using the same NAME in 2 different volumes. The second time it will fail and won't be loaded and no any info in the saveconfig.json, but it will still could find it which is the StorageObject created by the first creation, the the rollback will fail like: IQN: - PORTAL(S): - ROLLBACK FAILED ON: 192.168.195.164 RESULT: FAIL But the rollback is actually successful. After this fixing it will be like: failed to configure on 192.168.195.164 create StorageObject failed RESULT:FAIL Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>

The old code has covered the raw errors from targetcli command, but some of the raw errors will be very useful to get the root cause of some bugs. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>

This will avoid something like the following odd case: LOG("mgmt", GB_LOG_INFO, "tmp=%s", tmp); It will always give us: INFO: tmp=mgmt [at block_svc_routines.c+4394 :<block_create_common>] This is because the tmp dups to the "char *tmp" in the LOG macro. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>

rpc/block_svc_routines.c

pkalever · 2019-07-15T09:59:40Z

@lxbsz tested this and works as expected.

[root@server-1 gluster-block]# gluster-block create hosting-volume1/block-volume1 ha 3 prealloc full block-size 1024 192.168.124.59,192.168.124.78,192.168.124.44 1GiB --json-pretty
{                                                                                                                                                                                                                                             
  "IQN":"iqn.2016-12.org.gluster-block:84cb6694-add9-420f-9092-37b03bfe9086",                                                                                                                                                                 
  "PORTAL(S)":[                                                                                                                                                                                                                               
    "192.168.124.59:3260",                                                                                                                                                                                                                    
    "192.168.124.78:3260",                                                                                                                                                                                                                    
    "192.168.124.44:3260"                                                                                                                                                                                                                     
  ],                                                                                                                                                                                                                                          
  "RESULT":"SUCCESS"
}
[root@server-1 gluster-block]# gluster-block create hosting-volume2/block-volume1 ha 3 prealloc full block-size 1024 192.168.124.59,192.168.124.78,192.168.124.44 1GiB --json-pretty
{
  "RESULT":"FAIL",
  "errCode":255,
  "errMsg":"failed to configure on 192.168.124.59 block 'block-volume1' is already exist\nfailed to configure on 192.168.124.78 block 'block-volume1' is already exist\nfailed to configure on 192.168.124.44 block 'block-volume1' is already
 exist"
}

Two observations:

Can we tweak the user msg a bit, like:

block 'block-volume1' is already exist

Hint: block with name 'block-volume1' already exist (may be hosted on a different block-hosting volume)

I see rollback is called when we fail in the create check, IMO this is not needed, can we fix it ?

2019-07-15 09:48:03.721765] INFO: create request, volume=hosting-volume2 volserver=localhost blockname=block-volume1 blockhosts=192.168.124.59,192.168.124.78,192.168.124.44 filename=f16f72e3-72a1-45cc-9d4a-e27a9859157b authmode=0 passwd= size=1073741824 [at block_svc_routines.c+4368 :<block_create_common>]
[2019-07-15 09:48:04.432346] ERROR: failed in remote create for block block-volume1 on host 192.168.124.78 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.547346] ERROR: failed in remote create for block block-volume1 on host 192.168.124.44 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.603144] ERROR: failed in remote create for block block-volume1 on host 192.168.124.59 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.604859] WARNING: glusterBlockCreateRemoteAsync: return -1 failed in remote async create for block block-volume1 on volume hosting-volume2 with hosts 192.168.124.59,192.168.124.78,192.168.124.44 [at block_svc_routines.c+4072 :<block_create_cli_1_svc_st>]
[2019-07-15 09:48:04.638107] WARNING: No Spare nodes to create (block-volume1): rollingback creation of target on volume hosting-volume2 with given hosts 192.168.124.59,192.168.124.78,192.168.124.44 [at block_svc_routines.c+3103 :<glusterBlockAuditRequest>]
[2019-07-15 09:48:04.786051] INFO: delete request, blockname=block-volume1 filename=f16f72e3-72a1-45cc-9d4a-e27a9859157b [at block_svc_routines.c+4868 :<block_delete_1_svc_st>]
[2019-07-15 09:48:05.840416] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.44 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:05.897243] ERROR: Block 'block-volume1' may be not loaded. [at block_svc_routines.c+289 :<blockCheckBlockLoadedStatus>]
[2019-07-15 09:48:05.915958] ERROR: Block 'block-volume1' already deleted. [at block_svc_routines.c+314 :<blockCheckBlockLoadedStatus>]
[2019-07-15 09:48:05.973519] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.78 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:06.039381] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.59 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:06.406260] ERROR: glusterBlockAuditRequest: return -1volume: hosting-volume2 hosts: 192.168.124.59,192.168.124.78,192.168.124.44 blockname block-volume1 [at block_svc_routines.c+4081 :<block_create_cli_1_svc_st>]
[2019-07-15 09:48:06.406339] INFO: create cli returns success, block volume: hosting-volume2/block-volume1 [at block_svc_routines.c+4088 :<block_create_cli_1_svc_st>]

Thanks!

lxbsz · 2019-07-15T10:21:46Z

@lxbsz tested this and works as expected.

[root@server-1 gluster-block]# gluster-block create hosting-volume1/block-volume1 ha 3 prealloc full block-size 1024 192.168.124.59,192.168.124.78,192.168.124.44 1GiB --json-pretty
{                                                                                                                                                                                                                                             
  "IQN":"iqn.2016-12.org.gluster-block:84cb6694-add9-420f-9092-37b03bfe9086",                                                                                                                                                                 
  "PORTAL(S)":[                                                                                                                                                                                                                               
    "192.168.124.59:3260",                                                                                                                                                                                                                    
    "192.168.124.78:3260",                                                                                                                                                                                                                    
    "192.168.124.44:3260"                                                                                                                                                                                                                     
  ],                                                                                                                                                                                                                                          
  "RESULT":"SUCCESS"
}
[root@server-1 gluster-block]# gluster-block create hosting-volume2/block-volume1 ha 3 prealloc full block-size 1024 192.168.124.59,192.168.124.78,192.168.124.44 1GiB --json-pretty
{
  "RESULT":"FAIL",
  "errCode":255,
  "errMsg":"failed to configure on 192.168.124.59 block 'block-volume1' is already exist\nfailed to configure on 192.168.124.78 block 'block-volume1' is already exist\nfailed to configure on 192.168.124.44 block 'block-volume1' is already
 exist"
}

Two observations:

Can we tweak the user msg a bit, like:

block 'block-volume1' is already exist

Hint: block with name 'block-volume1' already exist (may be hosted on a different block-hosting volume)

Not sure whether is the case that only 2/3 nodes have the SO for some reason, like user have 4 nodes, users are trying to create hosting-volume1/block1 on node-1, node-2 and node-3, then create hosting-volume2/block1 on node-2, node-3 and node-4 ?

So the current error should be more in detail. Will it make sense ?

I see rollback is called when we fail in the create check, IMO this is not needed, can we fix it ?

2019-07-15 09:48:03.721765] INFO: create request, volume=hosting-volume2 volserver=localhost blockname=block-volume1 blockhosts=192.168.124.59,192.168.124.78,192.168.124.44 filename=f16f72e3-72a1-45cc-9d4a-e27a9859157b authmode=0 passwd= size=1073741824 [at block_svc_routines.c+4368 :<block_create_common>]
[2019-07-15 09:48:04.432346] ERROR: failed in remote create for block block-volume1 on host 192.168.124.78 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.547346] ERROR: failed in remote create for block block-volume1 on host 192.168.124.44 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.603144] ERROR: failed in remote create for block block-volume1 on host 192.168.124.59 volume hosting-volume2 [at block_svc_routines.c+971 :<glusterBlockCreateRemote>]
[2019-07-15 09:48:04.604859] WARNING: glusterBlockCreateRemoteAsync: return -1 failed in remote async create for block block-volume1 on volume hosting-volume2 with hosts 192.168.124.59,192.168.124.78,192.168.124.44 [at block_svc_routines.c+4072 :<block_create_cli_1_svc_st>]
[2019-07-15 09:48:04.638107] WARNING: No Spare nodes to create (block-volume1): rollingback creation of target on volume hosting-volume2 with given hosts 192.168.124.59,192.168.124.78,192.168.124.44 [at block_svc_routines.c+3103 :<glusterBlockAuditRequest>]
[2019-07-15 09:48:04.786051] INFO: delete request, blockname=block-volume1 filename=f16f72e3-72a1-45cc-9d4a-e27a9859157b [at block_svc_routines.c+4868 :<block_delete_1_svc_st>]
[2019-07-15 09:48:05.840416] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.44 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:05.897243] ERROR: Block 'block-volume1' may be not loaded. [at block_svc_routines.c+289 :<blockCheckBlockLoadedStatus>]
[2019-07-15 09:48:05.915958] ERROR: Block 'block-volume1' already deleted. [at block_svc_routines.c+314 :<blockCheckBlockLoadedStatus>]
[2019-07-15 09:48:05.973519] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.78 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:06.039381] ERROR: failed in remote delete for block block-volume1 on host 192.168.124.59 volume hosting-volume2 [at block_svc_routines.c+1094 :<glusterBlockDeleteRemote>]
[2019-07-15 09:48:06.406260] ERROR: glusterBlockAuditRequest: return -1volume: hosting-volume2 hosts: 192.168.124.59,192.168.124.78,192.168.124.44 blockname block-volume1 [at block_svc_routines.c+4081 :<block_create_cli_1_svc_st>]
[2019-07-15 09:48:06.406339] INFO: create cli returns success, block volume: hosting-volume2/block-volume1 [at block_svc_routines.c+4088 :<block_create_cli_1_svc_st>]

Yeah, this make sense.
Will fix it.
Thanks.

pkalever · 2019-07-15T10:28:23Z

Not sure whether is the case that only 2/3 nodes have the SO for some reason, like user have 4 nodes, users are trying to create hosting-volume1/block1 on node-1, node-2 and node-3, then create hosting-volume2/block1 on node-2, node-3 and node-4 ?

So the current error should be more in detail. Will it make sense ?

@lxbsz I'm okay to print the same msg from HA number of gluster-blockd's. I want you to fix the rpc/block_svc_routines.c line 4389 as:

- snprintf(reply->out, 8192, "block '%s' is already exist", blk->block_name);
+ snprintf(reply->out, 8192, "block with name '%s' already exist (Hint: may be hosted on a different block-hosting volume)", blk->block_name);

Thanks!

lxbsz · 2019-07-15T10:31:03Z

Not sure whether is the case that only 2/3 nodes have the SO for some reason, like user have 4 nodes, users are trying to create hosting-volume1/block1 on node-1, node-2 and node-3, then create hosting-volume2/block1 on node-2, node-3 and node-4 ?
So the current error should be more in detail. Will it make sense ?

@lxbsz I'm okay to print the same msg from HA number of gluster-blockd's. I want you to fix the rpc/block_svc_routines.c line 4389 as:
- snprintf(reply->out, 8192, "block '%s' is already exist", blk->block_name);
+ snprintf(reply->out, 8192, "block with name '%s' already exist (Hint: may be hosted on a different block-hosting volume)", blk->block_name);
Thanks!

Sure.
Thanks

rpc/block_svc_routines.c

pkalever

@lxbsz here is question before I start testing it,

What should I expect when I have 4 nodes, out of which N1, N2, N3 nodes has a block-volume name used, but I have supplied N1, N2, N4 nodes for creating block volume with same name ? Does the cleanup happen on all the 3 nodes N1, N2 & N4 ? or only on N4 ?

Cleaning up on only N4 is optimal though.

rpc/block_svc_routines.c

lxbsz · 2019-07-16T11:35:06Z

@pkalever

Updated it and fallback to the old version.

Jut one NOTE:

Currently, I can still hit the old issue: #204, which due to the meta data couldn't be flushed to the disk in time, especially in muti nodes case, like:

rhel3 --> 192.168.195.164
rhel1 --> 192.168.195.162
Two block-hosting volumes: rep and rep1.

I will create the block14 BV in the rep volume with node rhel3 only, then create the block14 BV in the rep1 volume with nodes rhel3 and rhel1

The following are the normal output as expected, when create the block14 BV more than once, it will fail with exist error on node rhel3, and on node rhel1 it will create/delete success:

[root@rhel3 ~]# gluster-block create rep/block14 ha 1 192.168.195.164 20M
IQN: iqn.2016-12.org.gluster-block:f4a59a9a-6a31-46d0-b6bb-ba831ac95731
PORTAL(S):  192.168.195.164:3260
RESULT: SUCCESS
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL
[root@rhel3 ~]# cat /tmp/rep1/block-meta/
meta.lock  prio.info  
[root@rhel3 ~]# cat /tmp/rep1/block-meta/block14
cat: /tmp/dht1/block-meta/block14: No such file or directory
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL
[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
failed to configure on 192.168.195.164 block with name 'block14' already exist (Hint: may be hosted on a different block-hosting volume)
RESULT:FAIL

But sometimes I can hit the following output in my setup:

[root@rhel3 ~]# gluster-block create rep1/block14 ha 2 192.168.195.164,192.168.195.162 20M
IQN: iqn.2016-12.org.gluster-block:3a6550f5-5dc1-482b-84eb-2ca54a5bda93
PORTAL(S):  192.168.195.164:3260 192.168.195.162:3260
ROLLBACK SUCCESS ON: 192.168.195.164 192.168.195.162 
RESULT: FAIL

This is because in block_create_cli_1_svc_st()--> glusterBlockAuditRequest() --> glusterBlockCleanUp() --> glusterBlockDeleteRemoteAsync() , in Line#1287, it will read the meta info when the Node 192.168.195.162's newest status hasn't be flushed to the disk, so cleanupsuccess != info->nhosts, then glusterBlockDeleteRemoteAsync() will return -1.

If I just wait for 2 seconds before Line#1287, this issue is disappeared.

Then the rep1/block-meta/block14 won't be deleted, the code:

1287   ret = blockGetMetaInfo(glfs, blockname, info_new, NULL);
1288   if (ret) {
1289     goto out;
1290   }
1291   ret = -1;
1292 
1293   for (i = 0; i < info_new->nhosts; i++) {
1294     switch (blockMetaStatusEnumParse(info_new->list[i]->status)) {
1295       case GB_CONFIG_INPROGRESS:  /* un touched */
1296       case GB_CLEANUP_SUCCESS:
1297         cleanupsuccess++;
1298         break;
1299     }
1300   }
1301 
1302   if (cleanupsuccess == info->nhosts) {
1303     ret = 0;
1304   }
1305   *savereply = local;

Just after this, I can see that the rep1/block-meta/block14:

[root@rhel3 ~]# cat /tmp/rep1/block-meta/block14
VOLUME: rep1
GBID: 3a6550f5-5dc1-482b-84eb-2ca54a5bda93
HA: 2
ENTRYCREATE: INPROGRESS
PRIOPATH: 192.168.195.164
SIZE: 20971520
RINGBUFFER: 0
BLKSIZE: 0
ENTRYCREATE: SUCCESS
192.168.195.164: CONFIGINPROGRESS
192.168.195.162: CONFIGINPROGRESS
192.168.195.164: CONFIGFAIL
192.168.195.162: CONFIGSUCCESS
192.168.195.164: CLEANUPINPROGRESS
192.168.195.162: CLEANUPINPROGRESS
192.168.195.164: CLEANUPFAIL
192.168.195.162: CLEANUPSUCCESS

The Node 192.168.195.162's status is already in GB_CLEANUP_SUCCESS = "CLEANUPSUCCESS".

Maybe this can be reproduced only in my local VM setups which is lacking of enough disk space.

Thanks.

This will fix the following issues: 1, When the StorageObject creation failed the iqn will still be created and won't be deleted in case of: When creating BVs with the same NAME, the second time it will fail duing to the targetcli cache db only allows one [user/$NAME] exist, and then fails it leaving the IQN not deleted correctly. That means there will be two different IQNs both mapped to the same StorageObject. 2, For the case above we will also find that after the second creation failing in the /etc/target/saveconfig.json there will be one StorageObject with the two Targets. In theory, there should be one StorageObject with only 1 Target. This patch will check all the StorageObjects in saveconfig.json before creating, if there already has one StorageObject with the same NAME requested, it will fail directly. Signed-off-by: Xiubo Li <[email protected]> Reviewed-by: Prasanna Kumar Kalever <[email protected]>

pkalever · 2019-07-17T07:57:43Z

Merged now. Thanks @lxbsz .

lxbsz requested review from pkalever and amarts July 1, 2019 06:05

lxbsz force-pushed the create branch from 1ae741e to 1ba5278 Compare July 2, 2019 02:19

lxbsz changed the title ~~create: delete the iqn when failed~~ @lxbsz create: split the creation into 2 phases Jul 2, 2019

lxbsz changed the title ~~@lxbsz create: split the creation into 2 phases~~ create: split the creation into 2 phases Jul 2, 2019

lxbsz force-pushed the create branch from 1ba5278 to c38c411 Compare July 2, 2019 03:17

lxbsz force-pushed the create branch from c38c411 to dd1ec6d Compare July 8, 2019 10:25

lxbsz force-pushed the create branch from dd1ec6d to 656ee3e Compare July 8, 2019 10:30

pkalever requested changes Jul 9, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz force-pushed the create branch 2 times, most recently from f5b03e9 to 9a63ca0 Compare July 12, 2019 06:21

lxbsz changed the title ~~create: split the creation into 2 phases~~ create: fails directly when the StorageObject is already exist with the same NAME Jul 12, 2019

pkalever reviewed Jul 12, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

pkalever reviewed Jul 12, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz force-pushed the create branch 2 times, most recently from e82babf to 858dacf Compare July 12, 2019 09:18

pkalever reviewed Jul 12, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

pkalever reviewed Jul 12, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz added 3 commits July 12, 2019 18:29

pkalever reviewed Jul 15, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz force-pushed the create branch 3 times, most recently from d8d1108 to 8c7c6f8 Compare July 15, 2019 07:05

lxbsz force-pushed the create branch from 8c7c6f8 to 79c8b20 Compare July 15, 2019 07:07

lxbsz force-pushed the create branch from 79c8b20 to e849089 Compare July 15, 2019 11:44

pkalever requested changes Jul 15, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz force-pushed the create branch from e849089 to e7d8b21 Compare July 15, 2019 13:01

pkalever reviewed Jul 16, 2019

View reviewed changes

rpc/block_svc_routines.c Outdated Show resolved Hide resolved

lxbsz force-pushed the create branch 3 times, most recently from 0ffe26b to 5a2100f Compare July 16, 2019 11:07

lxbsz force-pushed the create branch from 5a2100f to 98fc6ab Compare July 17, 2019 07:48

lxbsz force-pushed the create branch from 98fc6ab to caf31a6 Compare July 17, 2019 07:51

pkalever merged commit 34b65ea into gluster:master Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create: fails directly when the StorageObject is already exist with the same NAME #243

create: fails directly when the StorageObject is already exist with the same NAME #243

lxbsz commented Jul 1, 2019 •

edited

Loading

lxbsz commented Jul 2, 2019 •

edited

Loading

pkalever commented Jul 2, 2019

lxbsz commented Jul 2, 2019

lxbsz commented Jul 8, 2019

pkalever left a comment

lxbsz commented Jul 12, 2019

pkalever commented Jul 15, 2019 •

edited

Loading

lxbsz commented Jul 15, 2019

pkalever commented Jul 15, 2019

lxbsz commented Jul 15, 2019

pkalever left a comment

lxbsz commented Jul 16, 2019 •

edited

Loading

pkalever commented Jul 17, 2019

create: fails directly when the StorageObject is already exist with the same NAME #243

create: fails directly when the StorageObject is already exist with the same NAME #243

Conversation

lxbsz commented Jul 1, 2019 • edited Loading

What does this PR achieve? Why do we need it?

Does this PR fix issues?

lxbsz commented Jul 2, 2019 • edited Loading

pkalever commented Jul 2, 2019

lxbsz commented Jul 2, 2019

lxbsz commented Jul 8, 2019

pkalever left a comment

Choose a reason for hiding this comment

lxbsz commented Jul 12, 2019

pkalever commented Jul 15, 2019 • edited Loading

lxbsz commented Jul 15, 2019

pkalever commented Jul 15, 2019

lxbsz commented Jul 15, 2019

pkalever left a comment

Choose a reason for hiding this comment

lxbsz commented Jul 16, 2019 • edited Loading

pkalever commented Jul 17, 2019

lxbsz commented Jul 1, 2019 •

edited

Loading

lxbsz commented Jul 2, 2019 •

edited

Loading

pkalever commented Jul 15, 2019 •

edited

Loading

lxbsz commented Jul 16, 2019 •

edited

Loading