-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
user:rbd performance is too poor , my env is :kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543
Comments
i got the same performance on other cluster environment |
Can you post more about your test and environment?
|
One more, in gwcli can you give me the output of the 'info' command for the target? |
more about my test and env i use the same arguments to test native rbd and tcmu-runner with user:rbd output of 'iscsiadm -m session -P 2' output of iscsiadm -m node -T BEGIN RECORD 6.2.0.874-10node.name = iqn.2019-04.wz.com:001 END RECORDBEGIN RECORD 6.2.0.874-10node.name = iqn.2019-04.wz.com:001 END RECORDand i can't get max_data_area_mb and hw_max_sectors in gwcli , i get it in targetcli also i can't run info in gwcli of target , i get it in targetcli of iscsi/myiqn thanks ! |
the output of info in gwcli/iscsi-target/myiqn/gateways/node11
more : my ceph cluster crush map and tcmu-runner activate node is |
i change the log level of tcmu.conf to debug. then i get ohter log . on tcmu-runner restart init , i got : 2019-03-27 02:11:18.056 14735 [DEBUG] handle_sig:186: Have received signal! and one line 2019-03-27 02:12:17.853 21387 [INFO] alua_implicit_transition:569 rbd/block.test: Starting lock acquisition operation. and after Dozens of lines : 2019-03-27 02:12:18.169 21387 [WARN] tcmu_rbd_lock:757 rbd/block.test: Acquired exclusive lock. thanks! |
Are you using the upstream ceph-iscsi tools from github.com or from a RHCS or from ceph.com? They look older so that is why you do not have the target/disk info command. This value: hw_max_sectors=128 [ro] is really low for the initiator you are using. open-iscsi works better with larger IOs around 128k - 512k (512 - 1024 max_sectors). At 64k (128 max_sectors) the block layer is going to try to break that 4M write into 64 64k IOs. With your queue_depth limit below, we are then going to have to send 32, wait, then send 32 more. With the newer tools the default is 1024 so you should be ok. Although to match rbd you might want to try higher if your network can handle it. With older tools as you saw you cannot modify that. This value on the initiator are also going to make you hit the bug mentioned in the other GH issue you posted in when you start to send more IO: node.session.cmds_max = 128 See: If you are using the older ceph iscsi tools where you can't change the target cmdsn value so you need to set the initiator cmds_max to 64 which is the default cmdsn on the target side. With hw_max_sectors only 128 you then need to change this: node.session.queue_depth = 32 so you can keep the queue full. If you continue to use the old tools you could set this to the 64 too for now since that is the session limit and you are doing IO to only one LU. If you update to the new tools you should also increase MaxRecvDataSegmentLength You will have to play around with the values to see what your network can handle (note these should match the initiator side values). And you should change InitialR2T to Yes. |
i update all tools ,so i have info command in gwcli and i set some key like cmds_max, queue_depth in the initiator conf file /etc/iscsi/iscid.conf , but when i restart ,the conf do not fresh |
What is your network speed? For the target side, make your you have this ceph-iscsi patch it is not in ceph-iscsi 3.0 but is in the master repo if you pulled that. In gwcli under the disk/image dir do help to list the commands. There will be a reconfigure command. Do help reconfigure to see a list of settings and how to set them. For your test max_datra_area_mb is too low. It is the limit for how much data can be sent to the disk at any time. So in your fio test it would only cover 2 of the jobs since each job is doing 4MB. In the target dir there will be a similar reconfigure command. For the initiator side, you need to either do a -o update command on the record or if you only set it in iscsid.conf then you need to logout the sessions, do the discovery command again, then relogin like multipath -F I am going on vacation until April 8th, so I will answer any follows up then. |
my network speed is 10Gbps . Wish you a pleasant vacation. |
@Allenscript Sorry for the late reply. I must have missed this when I got back from vacation. I saw a comment from you today, but it looks like it was deleted. Do you still have issues with IOPs? Low IOPs might be expected. I am looking into it now. Are you running the linux initiator from a VM or physical machine? If you run fio your test with the rbd engine and numjobs=1 but iodepth=N like: fio --direct=1 --iodepth=128 --rw=write --ioengine=rbd --bs=4K --numjobs=1 --runtime=60 --group_reporting --name=write --pool=pool_name --rbd=image_name What are the values you see when you change iodepth from 128 to 1024? How does that compare to when you run fio on a iscsi LUN and the iscsi cmds_max/queue_depth/cmdsn limits are similar to rbd? |
@mikechristie I was interfering with this discussion, but later deleted my question. As I'm still unsure I'll repeat it: thank you |
default_cmdsn_depth is what is used if there is no matching ACL for the initiator. This is used when we first log a iscsi session in. If you write to it, it will only be used for new sessions so you would have to logout then relogin (I think you would also need to have cache dynamic ACL turned to off). If you have setup an ACL you can override that value after login by writing to the cmdsn_depth. If you have a iscsi session logged in and you write to this file, this will logout the session then relogin and use that new value. The max for those on the target side is 512. |
Any news here? |
The rbd maintainer did some rbd/rados performance fixes, but I think they are not in a stable release yet. If you have a non production setup, try out the ceph master branch or devel builds. Performance is still not close to krbd though. Part of it iscsi, but librbd itself is just slower than krbd. If when testing you can give me the numbers with the iscsi LUN and using ioengine=rbd it would so we can compare where we lose performance in your setup. |
@mikechristie I see those changes in Ceph 12.2.13 release. Did you do any tests with it? |
Hi @rearden-steel, I don't work at Red Hat anymore. I think @lxbsz took over. |
Not yet Mike :-) |
now i meet the same performance like this , fio with rbd ,the result was about 500MB/s , if with tcmu of user:rbd , the fio test result was about 15MB/s ,this performance is too poor , my env is :kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11
Originally posted by @Allenscript in #359 (comment)
The text was updated successfully, but these errors were encountered: