-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need help / tcmu-runner performance very slow #668
Comments
There has one old issue to track a similar issue #543, please check could this help ? |
And also this one #359. |
In theory the tcmu-runner should always have slower performance than krbd. Because the IO path is much longer than that in krbd. BTW, could you check the tcmu-runner logs and is there any frequent lock switching logs for the same image ? Such as in case you are using the |
BTW, what's your test script ? Are u using the fio ? If so what's your parameters ? Have u ever compare the tcmu-runner vs krbd by: |
@lxbsz @lightmans2 so: @lightmans2 |
I analyzed the code of tcmu,When tcmu gets messages from uio, it is no lock so it will not distinguish whether it is multi-threaded or not. I guess the performance difference between tcmu and krbd is 10%. |
Yeah, the TCMU and tcmu-runner don't know and also don't care about that, the TCMU will just queue the SCSI cmds in the TCMU's buffer and the tcmu-runner will handle them one by one. No locks in tcmu-runner when handing this. |
Hello lxbsz,
yes i will test this later or tonight yesterday night i foud now a solution or the right settings for performance, and it was a long night :-) with that speed i can live iscsi runs now with 150mb/s and up tp 250mb/s and i found out that the performance goes hard down if i have one iscsitarget for ex. iqn.2003-01.com.ceph.iscsi-gw with two or more LUNs then the performance are going down to max 50mb/s now i have a different problem... if i test the speed during some vms are running on this lun.... the iops goes up to >3k and speed with >150mb/s ... one of the gateways get stucked and the process freeze and crash looks like the gw who are mapping the rbd image gets stucked.. unfortunately I couldn't find a usable log last night in which log could i see it? on the machine console i found that |
[...]
What are your tcmu-runner/ceph-iscsi versions ? Could you upload the tcmu-runner.log and rbd-target-api.log files ? And at the same time could you see any crash errors in the journal log on that gateway node ? There has one known crash issue, please see #667. In that PR I have fixed several other bugs that also could cause crash. If your test will cause the active path switching frequently between gateways the above issue could be seen easily. |
Hi, tcmu-runner v1.5.2
here are the logs cd-88tcmu-runner.log |
Could you also upload the journal logs during the crash ?
From the above logs I couldn't know which service crashed. |
From one of the tcmu-runner.log:
This should be the same issue which has been fixed by b48eeeb in #667. Is that possible to test that PR in you setups ? Thanks. |
yes we can test it if you can tell me or explain me how i upgrade or patch , then no problem |
|
Are you using the cephadm/containers for the ceph-iscsi/tcmu-runner service ? If so I am afraid you need to update the package or install the tcmu-runner from source to override it on the node. |
hi, its a physical server with ubuntu 20.04 and ceph-iscsi installed from ceph debian repo as package. so no containers... |
Currently I didn't have the .deb setups to build the package. I usually build them from source:
You can try this. |
i have this error
|
Okay i compiled it and installit , i found the missing -dev package...
|
looks better now iscsi gateway 1:
iscsi gateway 2
|
Hey @lightmans2 Looks cool. Thanks very much for your tests. For the tcmalloc issue we may need to fix this in BTW, with this PR have hit any other issue ? Thanks. |
I am afraid you were using the This will only happen in version
I am afraid your installing from source didn't work. Or the path you choose to install is not the same with the package did, so you didn't override the ones of the package. |
hi lxbsz, hmm okay... can you help me to fix that? iam not a developer so dont know how to fix that or to correct that. the gateways are now stable and the performance are not between 100 - 200 mb/s |
Do you mean the
For building from source issue, you can check what's the install paths for all the binaries in the tcmu-runner 1.5.2 deb package, then you can specify the install prefix in
|
If the #667 works well for you I will merge it and do the new release soon. |
Hi lxbsz, you sayed that the install prefix are maybe wrong... how should the path look like? the original looks like that:
|
I didn't try this in Debian yet, the above will always be true for Centos/Fedora/RHEL.
It should be for But it seems the abbove prefix is correct, then I have no idea why you were still running the old After you installed it did you reload and restart it ? |
Hi lxbsz, yes i rebooted booth gatway servers and checked the tcmu-runner version its 1.5.4
esxi iscsi config: the osd nodes, mons, rgw/iscsi gateways and esxi nodes are all connected to the 10gbit network with bond-rr rbd benchmark test:
the rbd benchmark says that min 250 mb/s is possible... but i saw realy much more... up to 550mb/s if i start iftop on one osd node i see the ceph iscsi gw names as rgw and the traffic is nearly 80mb/s the ceph dashboard shows that the write iscsi performance are only 40mb/s if i look into the vcenter and esxi datastore performance i see very high storage device latencys between 50 and 100ms... very bad
can somebody explain me what i am doing wrong or what can i do to get a better performance with ceph-iscsi? i already experimented with gwcli and the iscsi queue and other settings. everything is fine and multipathing is workind and the recovery is fast ... but the iscsi very slow and i dont know why. |
Hello together,
we are testing since some weeks ceph and iscsi ...
i have a big iscsi/vmware performance problem and need some help with the tcmu-runner
if i use a linux vm and map over rbd a image then i have a speed 350>550 mb/s
if i use iscsi with the tcmu-runner then i have a max speed of 50>100mb/s
we use as os ubuntu 20.04 and ceph v16.2.5 with kernel 5.4.0-81-generic
tcmu-runner v1.5.2
some infos about the hardware and setup:
at the moment we connected a vmware esxi host hp dl385gen10 to this storage with esxi6.7 for testing luns and performance
can somebody tell me what else i can do to tweak the tcmu-runner to get the same performance like rbd?
i already eperimented with:
1.)
2.)
but i gain only 10% of speed :-(
thx a lot in advance for some hints and tips
The text was updated successfully, but these errors were encountered: