user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

Allenscript · 2019-03-26T17:43:26Z

now i meet the same performance like this , fio with rbd ,the result was about 500MB/s , if with tcmu of user：rbd , the fio test result was about 15MB/s ，this performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11

Originally posted by @Allenscript in #359 (comment)

Allenscript · 2019-03-26T17:47:17Z

i got the same performance on other cluster environment
kernel 3.10.957.5-1
tcmu 1.3.0-rc4
ceph 12.2.11

mikechristie · 2019-03-26T18:52:33Z

Can you post more about your test and environment?

What are the fio arguments being used?
What is the iscsi initiator OS? What are the iscsi settings being used? For example if linux, what is the output of iscsiadm -m session -P 2? What is the output of iscsiadm -m node -T yourtarget?
What is the max_data_area_mb and hw_max_sectors value being used (run gwcli and in the disk's dir run 'info' to get this if you do not know).

mikechristie · 2019-03-26T19:14:40Z

One more, in gwcli can you give me the output of the 'info' command for the target?

Allenscript · 2019-03-27T01:42:12Z

more about my test and env
fio test arguments:
-direct=1 -iodepth=1 -thread -rw=write -ioengine=psync -bs=4M -numjobs=32 -runtime=60 -group_reporting -name=write

i use the same arguments to test native rbd and tcmu-runner with user:rbd
isci initiator OS is CentOS Linux release 7.6.1810 (Core)
kernel 5.0.4-1.el7.elrepo.x86_64

output of 'iscsiadm -m session -P 2'
[root@node33 tcmu-runner-1.4.1]# iscsiadm -m session -P 2
Target: iqn.2019-04.wz.com:001 (non-flash)
Current Portal: 10.10.10.134:3260,1
Persistent Portal: 10.10.10.134:3260,1
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.2019-03.wz.com:001
Iface IPaddress: 10.10.10.136
Iface HWaddress:
Iface Netdev:
SID: 5
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
*********
Timeouts:
*********
Recovery Timeout: 120
Target Reset Timeout: 30
LUN Reset Timeout: 30
Abort Timeout: 15
*****
CHAP:
*****
username:
password: ********
username_in:
password_in: ********
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 262144
MaxXmitDataSegmentLength: 262144
FirstBurstLength: 65536
MaxBurstLength: 262144
ImmediateData: Yes
InitialR2T: Yes
MaxOutstandingR2T: 1

output of iscsiadm -m node -T
[root@node33 tcmu-runner-1.4.1]# iscsiadm -m node -T iqn.2019-04.wz.com:001

BEGIN RECORD 6.2.0.874-10

node.name = iqn.2019-04.wz.com:001
node.tpgt = 1
node.startup = onboot
node.leading_login = No
iface.hwaddress =
iface.ipaddress =
iface.iscsi_ifacename = default
iface.net_ifacename =
iface.gateway =
iface.subnet_mask =
iface.transport_name = tcp
iface.initiatorname =
iface.state =
iface.vlan_id = 0
iface.vlan_priority = 0
iface.vlan_state =
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
iface.bootproto =
iface.dhcp_alt_client_id_state =
iface.dhcp_alt_client_id =
iface.dhcp_dns =
iface.dhcp_learn_iqn =
iface.dhcp_req_vendor_id_state =
iface.dhcp_vendor_id_state =
iface.dhcp_vendor_id =
iface.dhcp_slp_da =
iface.fragmentation =
iface.gratuitous_arp =
iface.incoming_forwarding =
iface.tos_state =
iface.tos = 0
iface.ttl = 0
iface.delayed_ack =
iface.tcp_nagle =
iface.tcp_wsf_state =
iface.tcp_wsf = 0
iface.tcp_timer_scale = 0
iface.tcp_timestamp =
iface.redirect =
iface.def_task_mgmt_timeout = 0
iface.header_digest =
iface.data_digest =
iface.immediate_data =
iface.initial_r2t =
iface.data_seq_inorder =
iface.data_pdu_inorder =
iface.erl = 0
iface.max_receive_data_len = 0
iface.first_burst_len = 0
iface.max_outstanding_r2t = 0
iface.max_burst_len = 0
iface.chap_auth =
iface.bidi_chap =
iface.strict_login_compliance =
iface.discovery_auth =
iface.discovery_logout =
node.discovery_address = 10.10.10.134
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.nr_sessions = 1
node.session.auth.authmethod = None
node.session.auth.username =
node.session.auth.password =
node.session.auth.username_in =
node.session.auth.password_in =
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.session.scan = auto
node.conn[0].address = 10.10.10.134
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

END RECORD

BEGIN RECORD 6.2.0.874-10

node.name = iqn.2019-04.wz.com:001
node.tpgt = 2
node.startup = onboot
node.leading_login = No
iface.hwaddress =
iface.ipaddress =
iface.iscsi_ifacename = default
iface.net_ifacename =
iface.gateway =
iface.subnet_mask =
iface.transport_name = tcp
iface.initiatorname =
iface.state =
iface.vlan_id = 0
iface.vlan_priority = 0
iface.vlan_state =
iface.iface_num = 0
iface.mtu = 0
iface.port = 0
iface.bootproto =
iface.dhcp_alt_client_id_state =
iface.dhcp_alt_client_id =
iface.dhcp_dns =
iface.dhcp_learn_iqn =
iface.dhcp_req_vendor_id_state =
iface.dhcp_vendor_id_state =
iface.dhcp_vendor_id =
iface.dhcp_slp_da =
iface.fragmentation =
iface.gratuitous_arp =
iface.incoming_forwarding =
iface.tos_state =
iface.tos = 0
iface.ttl = 0
iface.delayed_ack =
iface.tcp_nagle =
iface.tcp_wsf_state =
iface.tcp_wsf = 0
iface.tcp_timer_scale = 0
iface.tcp_timestamp =
iface.redirect =
iface.def_task_mgmt_timeout = 0
iface.header_digest =
iface.data_digest =
iface.immediate_data =
iface.initial_r2t =
iface.data_seq_inorder =
iface.data_pdu_inorder =
iface.erl = 0
iface.max_receive_data_len = 0
iface.first_burst_len = 0
iface.max_outstanding_r2t = 0
iface.max_burst_len = 0
iface.chap_auth =
iface.bidi_chap =
iface.strict_login_compliance =
iface.discovery_auth =
iface.discovery_logout =
node.discovery_address = 10.10.10.134
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 8
node.session.xmit_thread_priority = -20
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.nr_sessions = 1
node.session.auth.authmethod = None
node.session.auth.username =
node.session.auth.password =
node.session.auth.username_in =
node.session.auth.password_in =
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.session.scan = auto
node.conn[0].address = 10.10.10.135
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

END RECORD

and i can't get max_data_area_mb and hw_max_sectors in gwcli , i get it in targetcli
the result is :
max_data_area_mb=1024 [ro]
hw_max_sectors=128 [ro]

also i can't run info in gwcli of target , i get it in targetcli of iscsi/myiqn
result is
fabric: iscsi
wwn: iqn.2019-04.wz.com:001

thanks !

Allenscript · 2019-03-27T01:53:33Z

One more, in gwcli can you give me the output of the 'info' command for the target?

the output of info in gwcli/iscsi-target/myiqn/gateways/node11
Name .. node11
Gateway Ip List .. 10.10.10.134
10.10.10.135
10.10.10.136
Portal Ip Address .. 10.10.10.134
Inactive Portal Ips .. 10.10.10.135
10.10.10.136
Active Luns .. 1
Tpgs .. 3
Service State

api .. UP
iscsi .. UP

more : my ceph cluster crush map and tcmu-runner activate node is
mon : node11 node22 node33 (also is osd node )
tcmu-runner: node11 node22 node33

Allenscript · 2019-03-27T06:31:51Z

i change the log level of tcmu.conf to debug. then i get ohter log .
like this :
i run fio test 60 sec , then i get 60 line of tcmu-runner.log
2019-03-27 02:20:38.687 21387 [DEBUG] tcmu_rbd_has_lock:521 rbd/block.test: Is owner
it appeared dozens of times.

on tcmu-runner restart init , i got :
2019-03-27 02:10:46.986 14735 [INFO] dyn_config_start:437: event->mask: 0x20
2019-03-27 02:10:46.987 14735 [INFO] dyn_config_start:437: event->mask: 0x1
2019-03-27 02:10:46.987 14735 [INFO] dyn_config_start:437: event->mask: 0x10
2019-03-27 02:11:06.782 14735 [INFO] dyn_config_start:437: event->mask: 0x800
2019-03-27 02:11:06.783 14735 [INFO] dyn_config_start:437: event->mask: 0x4
2019-03-27 02:11:06.783 14735 [INFO] dyn_config_start:437: event->mask: 0x400
2019-03-27 02:11:06.783 14735 [INFO] dyn_config_start:437: event->mask: 0x8000

2019-03-27 02:11:18.056 14735 [DEBUG] handle_sig:186: Have received signal!
2019-03-27 02:11:18.057 14735 [INFO] main:1162: Exiting...
2019-03-27 02:11:18.058 14735 [DEBUG] tcmu_block_device:394 rbd/block.test: blocking kernel device
2019-03-27 02:11:18.058 14735 [DEBUG] tcmu_block_device:400 rbd/block.test: block done
2019-03-27 02:11:18.058 14735 [DEBUG] tcmu_flush_device:374 rbd/block.test: waiting for ring to clear
2019-03-27 02:11:18.058 14735 [DEBUG] tcmu_flush_device:377 rbd/block.test: ring clear
2019-03-27 02:11:18.059 14735 [DEBUG] tcmu_cancel_recovery:158 rbd/block.test: Waiting on recovery thread
2019-03-27 02:11:18.059 14735 [DEBUG] tcmu_cancel_recovery:166 rbd/block.test: Recovery thread wait done
2019-03-27 02:11:18.060 14735 [DEBUG] tcmu_rbd_has_lock:521 rbd/block.test: Is owner
2019-03-27 02:11:18.368 14735 [DEBUG] tcmur_stop_device:604 rbd/block.test: cmdproc cleanup done
2019-03-27 02:11:18.368 14735 [DEBUG] dev_removed:912 rbd/block.test: removed from tcmu-runner
2019-03-27 02:11:18.368 14735 [DEBUG] tcmu_unblock_device:412 rbd/block.test: unblocking kernel device
2019-03-27 02:11:18.368 14735 [DEBUG] tcmu_unblock_device:418 rbd/block.test: unblock done
2019-03-27 02:11:18.368 14735 [DEBUG] remove_device:632 rbd/block.test: removed from tcmulib.
2019-03-27 02:11:18.391 21387 [INFO] dyn_config_start:422: Inotify is watching "/etc/tcmu/tcmu.conf", wd: 1, mask: IN_ALL_EVENTS
2019-03-27 02:11:18.392 21387 [DEBUG] load_our_module:529: Module 'target_core_user' is already loaded
2019-03-27 02:11:18.392 21387 [DEBUG] main:1090: handler path: /usr/lib64/tcmu-runner
2019-03-27 02:11:18.392 21387 [DEBUG] tcmu_block_netlink:99: blocking netlink
2019-03-27 02:11:18.392 21387 [DEBUG] tcmu_block_netlink:105: block netlink done
2019-03-27 02:11:18.392 21387 [DEBUG] tcmu_reset_netlink:152: reseting netlink
2019-03-27 02:11:18.392 21387 [DEBUG] tcmu_reset_netlink:159: reset netlink done
2019-03-27 02:11:18.408 21387 [DEBUG] main:1100: 3 runner handlers found
2019-03-27 02:11:18.409 21387 [DEBUG] tcmu_block_device:394 rbd/block.test: blocking kernel device
2019-03-27 02:11:18.409 21387 [DEBUG] tcmu_block_device:400 rbd/block.test: block done
2019-03-27 02:11:18.410 21387 [DEBUG] dev_added:780 rbd/block.test: Got block_size 512, size in bytes 536870912000
2019-03-27 02:11:18.410 21387 [DEBUG] tcmu_rbd_open:852 rbd/block.test: tcmu_rbd_open config rbd/block/test;osd_op_timeout=30 block size 512 num lbas 1048576000.
2019-03-27 02:11:18.421 21387 [DEBUG] timer_check_and_set_def:391 rbd/block.test: The cluster's default osd op timeout(0.000000), osd heartbeat grace(20) interval(6)
2019-03-27 02:11:18.461 21387 [DEBUG] tcmu_rbd_detect_device_class:334 rbd/block.test: Pool block using crush rule "replicated_rule"
2019-03-27 02:11:18.462 21387 [DEBUG] tcmu_rbd_match_device_class:265 rbd/block.test: nvme not a registered device class.
2019-03-27 02:11:18.738 21387 [DEBUG] tcmu_unblock_device:412 rbd/block.test: unblocking kernel device
2019-03-27 02:11:18.738 21387 [DEBUG] tcmu_unblock_device:418 rbd/block.test: unblock done
2019-03-27 02:11:18.739 21387 [DEBUG] tcmu_unblock_netlink:120: unblocking netlink
2019-03-27 02:11:18.739 21387 [DEBUG] tcmu_unblock_netlink:126: unblock netlink done

and one line
2019-03-27 02:12:17.836 21387 [WARN] tcmu_print_cdb_info:1193 rbd/block.test: a3 c 1 12 0 0 0 0 2 0 0 0 is not supported

2019-03-27 02:12:17.853 21387 [INFO] alua_implicit_transition:569 rbd/block.test: Starting lock acquisition operation.
2019-03-27 02:12:17.854 21387 [DEBUG] tcmu_acquire_dev_lock:373 rbd/block.test: Waiting for outstanding commands to complete
2019-03-27 02:12:17.854 21387 [DEBUG] tcmu_acquire_dev_lock:388 rbd/block.test: lock call state 2 retries 0. tag 65535 reopen 0
2019-03-27 02:12:17.855 21387 [DEBUG] tcmu_rbd_has_lock:524 rbd/block.test: Not owner
2019-03-27 02:12:17.856 21387 [DEBUG] tcmu_rbd_lock_break:576 rbd/block.test: Attempting to break lock from 10.10.10.134:0/3450190862.

and after Dozens of lines ：
2019-03-27 02:12:17.862 21387 [DEBUG] alua_implicit_transition:564 rbd/block.test: Lock acquisition operation is already in process.

2019-03-27 02:12:18.169 21387 [WARN] tcmu_rbd_lock:757 rbd/block.test: Acquired exclusive lock.
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_block_device:394 rbd/block.test: blocking kernel device
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_block_device:400 rbd/block.test: block done
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_flush_device:374 rbd/block.test: waiting for ring to clear
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_flush_device:377 rbd/block.test: ring clear
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_acquire_dev_lock:439 rbd/block.test: lock call done. lock state 1
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_unblock_device:412 rbd/block.test: unblocking kernel device
2019-03-27 02:12:18.169 21387 [DEBUG] tcmu_unblock_device:418 rbd/block.test: unblock done

thanks！

mikechristie · 2019-03-28T00:07:47Z

Are you using the upstream ceph-iscsi tools from github.com or from a RHCS or from ceph.com? They look older so that is why you do not have the target/disk info command.

This value:

hw_max_sectors=128 [ro]

is really low for the initiator you are using. open-iscsi works better with larger IOs around 128k - 512k (512 - 1024 max_sectors). At 64k (128 max_sectors) the block layer is going to try to break that 4M write into 64 64k IOs. With your queue_depth limit below, we are then going to have to send 32, wait, then send 32 more.

With the newer tools the default is 1024 so you should be ok. Although to match rbd you might want to try higher if your network can handle it. With older tools as you saw you cannot modify that.

This value on the initiator are also going to make you hit the bug mentioned in the other GH issue you posted in when you start to send more IO:

node.session.cmds_max = 128

See:
#359 (comment)

If you are using the older ceph iscsi tools where you can't change the target cmdsn value so you need to set the initiator cmds_max to 64 which is the default cmdsn on the target side.

With hw_max_sectors only 128 you then need to change this:

node.session.queue_depth = 32

so you can keep the queue full. If you continue to use the old tools you could set this to the 64 too for now since that is the session limit and you are doing IO to only one LU.

If you update to the new tools you should also increase

MaxRecvDataSegmentLength
MaxXmitDataSegmentLength:
FirstBurstLength
MaxBurstLength

You will have to play around with the values to see what your network can handle (note these should match the initiator side values).

And you should change InitialR2T to Yes.

Allenscript · 2019-03-28T05:24:31Z

i update all tools ,so i have info command in gwcli
i get new performance : max IO 140MB/s sometime to be zero ,it like blocking , and fio test BW is 50MB/s
the hw_max_sectors default set 1024 , but i don't know how to reconfigure this value
and max_data_area_mb default set 8 , is it right?
i don't know how to set this key with gwcli : hw_max_sectors max_data_area_mb

and i set some key like cmds_max, queue_depth in the initiator conf file /etc/iscsi/iscid.conf , but when i restart ,the conf do not fresh

mikechristie · 2019-03-28T16:17:05Z

What is your network speed?

For the target side, make your you have this ceph-iscsi patch

ceph/ceph-iscsi@e8550d7

it is not in ceph-iscsi 3.0 but is in the master repo if you pulled that.

In gwcli under the disk/image dir do help to list the commands. There will be a reconfigure command. Do

help reconfigure

to see a list of settings and how to set them.

For your test max_datra_area_mb is too low. It is the limit for how much data can be sent to the disk at any time. So in your fio test it would only cover 2 of the jobs since each job is doing 4MB.

In the target dir there will be a similar reconfigure command.

For the initiator side, you need to either do a -o update command on the record or if you only set it in iscsid.conf then you need to logout the sessions, do the discovery command again, then relogin like

multipath -F
iscsiadm -m node --logout
iscsiadm -m discovery -t st -p your_ip
iscsiadm -m node --login
multiapth

I am going on vacation until April 8th, so I will answer any follows up then.

Allenscript · 2019-03-29T09:00:17Z

my network speed is 10Gbps .
i try to reconfigure hw_max_sectors to 2048 , max_data_area_mb to 128 .
so i get new performance up to 300MB/s with argument blocks=4M.
but when i do fio test with argument blocks=4k , i get low iops .
is there have some way to calculate the hw_max_sectors and max_data_area_mb most appropriate value ?

Wish you a pleasant vacation.

mikechristie · 2019-04-26T19:01:19Z

@Allenscript Sorry for the late reply. I must have missed this when I got back from vacation. I saw a comment from you today, but it looks like it was deleted.

Do you still have issues with IOPs? Low IOPs might be expected. I am looking into it now. Are you running the linux initiator from a VM or physical machine?

If you run fio your test with the rbd engine and numjobs=1 but iodepth=N like:

fio --direct=1 --iodepth=128 --rw=write --ioengine=rbd --bs=4K --numjobs=1 --runtime=60 --group_reporting --name=write --pool=pool_name --rbd=image_name

What are the values you see when you change iodepth from 128 to 1024? How does that compare to when you run fio on a iscsi LUN and the iscsi cmds_max/queue_depth/cmdsn limits are similar to rbd?

mathrock74 · 2019-04-29T10:05:10Z

@mikechristie I was interfering with this discussion, but later deleted my question. As I'm still unsure I'll repeat it:
you are pointing to the necessity of aligning default_cmdsn_depth (target) with node.session.cmds_max (initiator). In gwcli I can only change cmdsn_depth in initiator context, not default_cmdsn_depth. What's the relationship of
/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw-*/*/attrib/default_cmdsn_depth
vs
/sys/kernel/config/target/iscsi/iqn.2003-01.com.redhat.iscsi-gw:iscsi-igw-*/*/acls/*/cmdsn_depth
?
Which of the two has to be aligned with node.session.cmds_max?
by default cmdsn_depth == node.session.cmds_max == 128, so I don't know if I have to change anything at all

thank you

mikechristie · 2019-04-30T22:37:55Z

default_cmdsn_depth is what is used if there is no matching ACL for the initiator. This is used when we first log a iscsi session in. If you write to it, it will only be used for new sessions so you would have to logout then relogin (I think you would also need to have cache dynamic ACL turned to off).

If you have setup an ACL you can override that value after login by writing to the cmdsn_depth. If you have a iscsi session logged in and you write to this file, this will logout the session then relogin and use that new value.

The max for those on the target side is 512.

rearden-steel · 2019-10-14T15:28:59Z

Any news here?

mikechristie · 2019-10-14T16:26:42Z

The rbd maintainer did some rbd/rados performance fixes, but I think they are not in a stable release yet. If you have a non production setup, try out the ceph master branch or devel builds.

Performance is still not close to krbd though. Part of it iscsi, but librbd itself is just slower than krbd. If when testing you can give me the numbers with the iscsi LUN and using ioengine=rbd it would so we can compare where we lose performance in your setup.

rearden-steel · 2020-07-03T16:10:44Z

@mikechristie I see those changes in Ceph 12.2.13 release. Did you do any tests with it?

mikechristie · 2020-07-04T15:47:58Z

Hi @rearden-steel, I don't work at Red Hat anymore.

I think @lxbsz took over.

lxbsz · 2020-07-08T02:37:55Z

Hi @rearden-steel, I don't work at Red Hat anymore.

I think @lxbsz took over.

Not yet Mike :-)

lxbsz mentioned this issue Aug 31, 2021

need help / tcmu-runner performance very slow #668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

Allenscript commented Mar 26, 2019

Allenscript commented Mar 26, 2019

mikechristie commented Mar 26, 2019

mikechristie commented Mar 26, 2019

Allenscript commented Mar 27, 2019

Allenscript commented Mar 27, 2019

Allenscript commented Mar 27, 2019 •

edited

Loading

mikechristie commented Mar 28, 2019

Allenscript commented Mar 28, 2019

mikechristie commented Mar 28, 2019

Allenscript commented Mar 29, 2019

mikechristie commented Apr 26, 2019

mathrock74 commented Apr 29, 2019 •

edited

Loading

mikechristie commented Apr 30, 2019

rearden-steel commented Oct 14, 2019

mikechristie commented Oct 14, 2019

rearden-steel commented Jul 3, 2020

mikechristie commented Jul 4, 2020

lxbsz commented Jul 8, 2020

user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

Comments

Allenscript commented Mar 26, 2019

Allenscript commented Mar 26, 2019

mikechristie commented Mar 26, 2019

mikechristie commented Mar 26, 2019

Allenscript commented Mar 27, 2019

BEGIN RECORD 6.2.0.874-10

END RECORD

BEGIN RECORD 6.2.0.874-10

END RECORD

Allenscript commented Mar 27, 2019

Allenscript commented Mar 27, 2019 • edited Loading

mikechristie commented Mar 28, 2019

Allenscript commented Mar 28, 2019

mikechristie commented Mar 28, 2019

Allenscript commented Mar 29, 2019

mikechristie commented Apr 26, 2019

mathrock74 commented Apr 29, 2019 • edited Loading

mikechristie commented Apr 30, 2019

rearden-steel commented Oct 14, 2019

mikechristie commented Oct 14, 2019

rearden-steel commented Jul 3, 2020

mikechristie commented Jul 4, 2020

lxbsz commented Jul 8, 2020

Allenscript commented Mar 27, 2019 •

edited

Loading

mathrock74 commented Apr 29, 2019 •

edited

Loading