Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regarding --socket-mem 1024,1024 #3

Open
pinggit opened this issue May 29, 2020 · 7 comments
Open

regarding --socket-mem 1024,1024 #3

pinggit opened this issue May 29, 2020 · 7 comments

Comments

@pinggit
Copy link
Owner

pinggit commented May 29, 2020

isn't --socket-mem to allocate hugepage for vrouter only (not to VM)?
or, this is actually a same "global" system-wise parameter, which applies to both vrouter and VM
, just as the kernel hugepagesz=1G hugepages=40 parameter?

@pinggit
Copy link
Owner Author

pinggit commented May 29, 2020

answer from Laurent:

Answer: This is also something we have to explain clearly. Hugepages use in
DPDK are really badly explained. And, once again this is not so complex as it
seems.

You just have to keep in mind that :

  • Packet are put into memory at one place and never copied

(only descriptors are moving from one Q to another)

  • Consequently “memory” area where the packet are put must be shared between
    DPDK vrouter and all instances (what ever the instance is – DPDK or not DPDK)

Then, you have DPDK setup :

  • At system level : to allocate a given amount of Memory as “hugepage type”
  • At vrouter level: to use an amount of system hugepages to store packets (in mbufs)
  • vRouter is using CPU on a same NUMA (at least this is the recommended setup
    if you want to avoid performance issues).
  • Virtual instances are using CPU on both NUMA (and most probably on the other
    NUMA which is not used by vrouter – because you have lots of CPU on this
    second one available for your VMs)

So, in a short, you have instances running on both NUMA. They have to be able
access packets that are referenced by descriptors (that vrouter as put in vNIC
RX queue).

This is why, by default we spread hugepage memory allocation on both NUMA.

image7

Here, is shown how huge pages are used.

So, first you are allocating HugePage at system level (at startup for 1G huge pages):

default_hugepagesz=1GB hugepagesz=1G hugepages=40 hugepagesz=2M hugepages=40

I guess, that Huge pages are equally balanced on both NUMA (to be checked)

Then you are requesting at vrtouter level a part of them for vrouter DPDK
application need (to store both underlay and VM packets):

--socket-mem <value>,<value>

@pinggit
Copy link
Owner Author

pinggit commented May 29, 2020

@ldurandadomia , with these explained, it also indicates:
for those VMs spawn in a different NUMA1 than where VROUTER is running (say NUMA0 in this example), the performance will be lower due to QPI slowness? so for time sensitive VM we have to spawn them in NUMA0 only?

@ldurandadomia
Copy link
Collaborator

@pinggit
Yes, this is a matter of cost versus performance.
If you put everything on a single NUMA you'll exhaust very quickly all available cores on this NUMA.
Let's take a an example.
With a CPU having 18 cores per NUMA (it makes 36 with siblings)

  • a vrouter configured with 8 lcores (4 physical CPU)
  • 1 CPU kept for OS and vrouter service and control threads
    ==> it remains only 13 CPU. 26 logical ones.

If you are starting DPDK VM with at least 8 vCPU (at least the same number of CPU as vRouter to get the same number of Q on VM side), you can only spawn 3 VM.

This is really poor and not realistic for most of customers ....

@ldurandadomia
Copy link
Collaborator

ldurandadomia commented May 29, 2020

@pinggit
Idea is more:

  • vrouter polling/forwarding threads must be pinned onto the same NUMA
  • any VM must be fully pinned on the same NUMA

Idea is to avoid part of the traffic of VM (or vrouter) to be processed on a first NUMA and other part on the second one (it would create internal delays, reordering, ...).

Next, if on a given host compute, you have one VNF that is requiring more performance that the others, it's probably clever to pin it on the same NUMA than vrouter.

You also have to pay attention to use physical NIC that are pinned on the same NUMA as the vrouter (this is hard coded into the PCI slot - so if a NIC is not on the appropriate NUMA you have to use another NIC or move the NIC on another slot).

@pinggit
Copy link
Owner Author

pinggit commented May 29, 2020

thanks @ldurandadomia .
one more thing:

If you are starting DPDK VM with at least 8 vCPU (at least the same number of CPU as vRouter to get the same number of Q on VM side), you can only spawn 3 VM.

  1. in practice, I believe VMs are sharing cores among each other. so you can spawn 10 VMs with each using same 4 cores like 3,4,5,6. - if the performance is not a big deal for these VMs?

  2. VM Qs are determined by vCPU assigned to the VM, not the other way around. so you assign 2 vCPU to a VM, the VM will have 2 queues per each vNIC. you don't have to assign 8 vCPU. correct me if I'm wrong. I can open a new issue on this topic.

@ldurandadomia
Copy link
Collaborator

1 - in practice, I believe VMs are sharing cores among each other. so you can spawn 10 VMs with each using same 4 cores like 3,4,5,6. - if the performance is not a big deal for these VMs?

Not really ...
If you are building a VNF (virtual network function) DPDK application. You have the same concern on the VNF has on the vrouter DPDK application, You want performance and you do not want to share the allocated CPU !!!

This is why, when Contrail DPDK is used:

  • connected VM should run DPDK application
  • DPDK application requires fully dedicated CPU to get expected performance.
    ==> host CPU allocated to a VM should not be shared

@ldurandadomia
Copy link
Collaborator

2 - VM Qs are determined by vCPU assigned to the VM, not the other way around. so you assign 2 vCPU to a VM, the VM will have 2 queues per each vNIC. you don't have to assign 8 vCPU. correct me if I'm wrong. I can open a new issue on this topic.

This is defined by the VM setup (libvirt XML configuration file). But with OpenStack, an implicit rule is used to configure NIC Queues. This implicit rule is configuring the same number of Q on each vNIC as the number of CPU defined on the VM.

No way to configure is differently. It is nevertheless possible to reduce this number of Q after VM startup using ethtool command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants