Skip to content

xp load clust rabbitmq

Matthieu Simonin edited this page Oct 18, 2016 · 12 revisions

Load-clust-rabbitmq Experimentation

Purpose

Evaluate with a clustered rabbitmq.

Configuration

physical nodes
1 control, 1 network, 20 computes, 1 util, 3 rabbitmq-node.
control
neutron_server, nova_scheduler, nova_novncproxy, nova_consoleauth, nova_api, glance_api, glance_registry, keystone, cron, memcached kolla_toolbox, heka, cadvisor, docker_registry, nova_conductor,collectd
network
neutron_metadata_agent, neutron_l3_agent, neutron_dhcp_agent, neutron_openvswitch_agent, neutron_openvswitch_agent, openvswitch_db, keepalived, cron, kolla_toolbox, haproxy, heka, cadvisor, collectd
compute
nova_ssh, nova_libvirt, nova_compute_fake_1, …, nova_compute_fake_#fake, openvswitch_db, openvswitch_vswitchd, neutron_openvswitch_agent, neutron_openvswitch_agent_fake_1, …, neutron_openvswitch_agent_fake_#fake, cron, kolla_toolbox, heka, cadvisor, , collectd
util
cadvisor, grafana, influx (rally)
3 rabbitmq-node
rabbitmq, cadvisor, collectd, heka

Get results

First, find the name of the host machine.

cd results
vagrant up load-clust-rabbit

Experimental protocol

deploy 1000 nodes openstack
boot_and_delete concurrency=50 times=100
wait 
boot_and_list concurrency=50 times=100

More concretely :

./kolla-g5k.py ;  sleep 500; ./kolla-g5k.py bench --scenarios=vanilla-boot-delete-then-boot-list.txt --times=100 --concurrency=50 --wait=100 ; sleep 300; ./kolla-g5k.py bench --scenarios=vanilla-boot-delete-then-boot-list.txt --times=100 --concurrency=50 --wait=100
> cat vanilla-boot-delete-then-boot-list.txt
disco-rally-boot-and-delete.json
disco-rally-boot-and-list.json

load-clust-rabbit-cpt20-nfk50

Observation 1

In nova-scheduler : ComputeFilter returns 0 host : VMs can’t find a host to start because compute are declared offline.

Hypothesis : Compute nodes state aren’t updated within the service_down_time interval and thus declared offline.

Solutions :

  1. report_interval (10s) is too low and too much pressure on the system. Increasing it would put less pressure on the conductor and the DB. service_down_time must be increased accordingly.
  2. ~instance_sync_interval (120s) ~ is also putting some load on the scheduler.
  3. scale the conductor

Observation 2

In rabbitmq : Closing connections due to {handshake_timeout,frame_header} or {handshake_timeout, timeout}

Hypothesis : Too much latency on rabbimq when opening a connection

Solutions:

  • Increase the timeout
  • Decrease the load

load-clust-rabbit-cpt20-nfk50-tuned-report-sync-intervals

Goal : see the effect of applying the solution 1. & 2. of observation 1.

Observation :

  • Compute filter is ok for every requests (checked in nova-scheduler.log)
  • Only 12 errors when deleting instances.

Conclusion: Putting less load when updating the services state leads to better results.

But since instances aren’t sync often, scheduler have a less acurate view of the system. This could lead to more retries when running an “almost full” system.

load-clust-rabbit-cpt20-nfk50-tuned-report-sync-intervals-handshake-timeout

Goal : see the effect of increasing the handshake timeout.

Observation :

  • No more timeout on rabbitmq logs
  • Every tests passed

load-clust-rabbit-cpt20-nfk50-cond10-tuned-handshake-timeout

Goal : try to absorb the load of the compute reports by increasing the number of conductors (here 10).

Observation :

  • (nova-scheduler) ComputeFilter returns 0 host:

grep "ComputeFilter returned 0 hosts" nova-scheduler.log |grep -o "req-[[:alnum:]]*" | uniq |wc -l -> 291

  • some DB errors at the rally levels (deconnection, reentrant calls)
  • many time out
  • many failures in rally tests

Conclusion :

increasing the number of conductor obviously doesn’t solve anything.

load-clust-rabbit-cpt20-nfk50-sched8-tuned-handshake-timeout

Goal : see if adding more scheduler can help in making the rally benchs pass.

Observation :

all tests passed

Conclusion

load-clust-rabbit-cpt20-nfk50-sched8-tuned-handshake-timeout-rally-times-10000

Goal : see if the system can handle the load in the long term (times = 10000)

Observation :

  • Many 504 errors starting at iteration #9000.
  • Mysql connections is hitting the 2000 connection limit of haproxy

Conclusion :

Let’s confirm with a second run

load-clust-rabbit-cpt20-nfk50-sched8-tuned-handshake-timeout-rally-times-10000-2

Observation :

confirmed

load-clust-rabbit-cpt20-nfk50-sched8-tuned-handshake-timeout-rally-times-10000-4

Goal : Increase haproxy limit (global maxconn: 100000 and frontend 2000)

Observation:

all test passed

Conclusion

Makes haproxy as transparent as possible makes life easier!

8 schedulers notes

Running 8 schedulers on 8 hosts is easy with kolla-g5k, but running 8 schedulers on the same host become a little bit more tricky. Here is a method :

  • create a /scheduler_1 to handle the logs
  • gives the permissions : chown 162 /scheduler_1
  • start the container :
docker run -ti --net=host  -v /scheduler_1:/var/log/kolla:rw -v /etc/localtime:/etc/localtime:ro -v /etc/kolla/nova-scheduler/:/var/lib/kolla/config_files/:ro -e  "KOLLA_SERVICE_NAME=nova-scheduler" -e "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" -e "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" -e "KOLLA_BASE_DISTRO=centos" -e "KOLLA_INSTALL_TYPE=binary" -e "KOLLA_INSTALL_METATYPE=rdo" -e "PS1=$(tput bold)($(printenv KOLLA_SERVICE_NAME))$(tput sgr0)[$(id -un)@$(hostname -s) $(pwd)]$ " --name nova_scheduler_1 -d kolla/centos-binary-nova-scheduler:2.0.2

repeat the above for each scheduler (a small bash script will fit)

Note : the service hostname will be the same, the database will hold only one record for the scheduler. Is it an issue ? Benchmarks have been run successfully using this configuration. Note: On g5k, it’s maybe a better idea to use /tmp as parent directory (size limitation on / otherwise)