Intel E810

Allikas: Imre kasutab arvutit
Mine navigeerimisribaleMine otsikasti

Sissejuhatus

Üldistel teemadel sobib tutvuda tekstiga https://www.auul.pri.ee/wiki/Mellanox_ConnectX-6_Lx_EN

Kasutada on võrgukaardi riistvara

# lspci | grep 810
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)

Mõisted

  • vvu - 'virtio vhost user' protokoll

dpdk abil liikluse kohale toomine ovs switchi juurde

Väited

  • eesmärgiks on füüsiliselt võrgust kohale tuua ovs switchi peale füüsilise võrgukaardi juures dpdk lahendust kasutades võimalikult palju liiklust
  • OVS switchi peale on tekitatud üks ovs internal port (internal port on näha ka operatsioonisüteemi jaoks, st tema poole saab eemalt pöörduda, nt pve webgui kasutamiseks ja ssh abil sisse logimiseks)
  • ei tegelda liikluse edasi jõudmisega ovs switchi külge kinnitatud virtuaalse arvuti juurde
  • tegevused toimuvad PVE v. 8.2 keskkonnas
  • midagi ei kompileerita st kõik paigaldatakse Debian ja PVE tava apt repost

Tulemuseks ovs switch kasutamise osa seadistus paistab selline

# ovs-vsctl show
26f5a7ed-dfe6-49bb-978f-062dd420f331
    Bridge vmbr0
        datapath_type: netdev
        Port vmbr0
            Interface vmbr0
                type: internal
        Port dpdk-p0
            Interface dpdk-p0
                type: dpdk
                options: {dpdk-devargs="0000:81:00.0"}
        Port inter
            Interface inter
                type: internal
    ovs_version: "3.1.0"

ning ovs üldosa seadistus

# ovs-vsctl list open_vSwitch
_uuid               : 26f5a7ed-dfe6-49bb-978f-062dd420f331
bridges             : [4b2f6152-da5c-495b-bd7f-c3020d8e4012]
cur_cfg             : 17
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.3.1"
dpdk_initialized    : true
dpdk_version        : "DPDK 22.11.5"
external_ids        : {hostname=valgustaja1, rundir="/var/run/openvswitch", system-id="257085fe-47b8-4b5d-b894-3063fbb645e2"}
iface_types         : [afxdp, afxdp-nonpmd, bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 17
other_config        : {dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="4096", pmd-cpu-mask="0x0ff0", userspace-tso-enable="false", vhost-iommu-support="true"}
ovs_version         : "3.1.0"
ssl                 : []
statistics          : {}
system_type         : debian
system_version      : "12"

ning

root@valgustaja1# ifconfig 
inter: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
        inet 10.40.134.18  netmask 255.255.255.248  broadcast 10.40.134.23
        inet6 fe80::b470:b7ff:fe0c:8ea9  prefixlen 64  scopeid 0x20<link>
        ether b6:70:b7:0c:8e:a9  txqueuelen 1000  (Ethernet)
        RX packets 9783611  bytes 14414601922 (13.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4824979  bytes 409852147 (390.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 307863  bytes 56590252 (53.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 307863  bytes 56590252 (53.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Sellise olukorra saavutamiseks Ubuntu v. 24.04 operatsioonisüsteemil paigaldatakse tarkvara OVS-DPDK, sobib taustaks vaadata juhendeid

Tarkvara paigaldamine, ice ehk intel e810 mudeli dpdk teek ei paigaldata muu hulgas sõltuvusena koos openvswitch-switch-dpdk paketiga, kuigi hulka librte-net-xxx pakette seejuures paigaldatakse

valgustaja1# apt-get install openvswitch-switch-dpdk 
valgustaja1# apt-get install librte-net-ice23

Peale tarkvara paigaldamist on ovs käivitatud olekus ning sobib seda edasi seadistada

valgustaja1# echo 6144 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:dpdk-socket-mem=4096"
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:pmd-cpu-mask=0x100000100000"
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:vhost-iommu-support=true"
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:userspace-tso-enable=false"
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:dpdk-lcore-mask=0x1"
valgustaja1# ovs-vsctl set Open_vSwitch . "other_config:dpdk-init=true"

librte dpdk teegi kasutamiseks sobib öelda

valgustaja1# modprobe vfio-pci
valgustaja1# modprobe vfio
valgustaja1# dpdk-devbind.py --bind=vfio-pci 0000:81:00.0
valgustaja1# dpdk-devbind.py --bind=vfio-pci 0000:81:00.1

Tulemusena paistab

# dpdk-devbind.py --status

Network devices using DPDK-compatible driver
============================================
0000:81:00.0 'Ethernet Controller E810-XXV for SFP 159b' drv=vfio-pci unused=ice
0000:81:00.1 'Ethernet Controller E810-XXV for SFP 159b' drv=vfio-pci unused=ice

..

OVS switchi tekitamine ja füüsilise kaardi pordi ühendamine

valgustaja1# systemctl restart openvswitch-switch
valgustaja1# ovs-vsctl add-br vmbr0 -- set bridge vmbr0 datapath_type=netdev
valgustaja1# ovs-vsctl add-port vmbr0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:81:00.0
valgustaja1# ovs-vsctl add-port vmbr0 inter -- set interface inter type=internal
valgustaja1# ifconfig inter 10.40.134.18/29
valgustaja1# route add default gw 10.40.134.17

Ootus on et tulemusena saab pve host arvutist pingida gw ip aadressi ja arvuti on võrku ühendatud.

Probleemid, kui librte-net-ice23 pakett on puudu või peale OVS üldosa seadistamist on muudatuste kehtestamiseks OVS switchile restart tegemata, siis saab selliseid vigu

valgustaja1# tail -f /var/log/openvswitch/ovs-vswitchd.log
2024-07-09T01:00:04.674Z|00076|dpdk|ERR|EAL: Driver cannot attach the device (0000:81:00.0)
2024-07-09T01:00:04.674Z|00077|dpdk|ERR|EAL: Failed to attach device on primary process
2024-07-09T01:00:04.674Z|00078|netdev_dpdk|WARN|Error attaching device '0000:81:00.0' to DPDK
2024-07-09T01:00:04.674Z|00079|netdev|WARN|dpdk-p0: could not set configuration (Invalid argument)
2024-07-09T01:00:04.674Z|00080|dpdk|ERR|Invalid port_id=32

dpdk abil liikluse kohale toomine ovs switchi kaudu qemu virtuaalse arvuti juurde - vhost user protokoll

Tulemuseks ovs switch kasutamise osa seadistus paistab selline

# ovs-vsctl show
26f5a7ed-dfe6-49bb-978f-062dd420f331
    Bridge vmbr0
        datapath_type: netdev
        Port vmbr0
            Interface vmbr0
                type: internal
        Port vhost-user-1
            tag: 3564
            Interface vhost-user-1
                type: dpdkvhostuserclient
                options: {vhost-server-path="/var/run/vhostuserclient/vhost-user-client-1"}
        Port vhost-user-2
            tag: 3564
            Interface vhost-user-2
                type: dpdkvhostuserclient
                options: {vhost-server-path="/var/run/vhostuserclient/vhost-user-client-2"}
        Port dpdk-p0
            Interface dpdk-p0
                type: dpdk
                options: {dpdk-devargs="0000:81:00.0", n_rxq="8"}
        Port inter
            Interface inter
                type: internal
    ovs_version: "3.1.0"

kus

  • vhost-user-1 port vastab ühe virtuaalse arvuti ühele võrguliidesele
  • vhost-user-2 port vastab teise virtuaalse arvuti ühele võrguliidesele
  • /var/run/vhostuserclient/vhost-user-client-1 on unix socket, mille tekitab qemu/kvm protsess virtuaalse arvuti käivitamisel (selle tekitamist õpetab /etc/pve/qemu-server/1102.conf seadistus)

OVS osa tekitatakse eelmises punktis, lisaks on oluline tekitada OVS juurde port, nt

valgustaja1# mkdir /var/run/vhostuserclient

valgustaja1# ovs-vsctl add-port vmbr0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-1"
valgustaja1# ovs-vsctl set port vhost-user-1 tag=3564

valgustaja1# ovs-vsctl add-port vmbr0 vhost-user-2 -- set Interface vhost-user-2 type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-2"
valgustaja1# ovs-vsctl set port vhost-user-2 tag=3564


kasutada sobiva seadistusega PVE virtuaalset arvutit, nt ühe arvuti puhul

valgustaja1# cat /etc/pve/qemu-server/1102.conf
agent: 1
args: -machine q35+pve0,kernel_irqchip=split \
  -device intel-iommu,intremap=on,caching-mode=on \
  -chardev socket,id=char1,path=/var/run/vhostuserclient/vhost-user-client-1,server=on \
  -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce=on,queues=4 \
  -device virtio-net-pci,mac=12:4A:8D:1E:33:3D,netdev=mynet1,mq=on,vectors=10,rx_queue_size=1024,tx_queue_size=256
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 8
cpu: host
efidisk0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hugepages: 1024
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=9.0.0,ctime=1720472955
name: imre-ubuntu-2404-01
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=eff2a3c6-e3e6-4ccf-889b-33babc49c3fc
sockets: 1
tpmstate0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-1,size=4M,version=v2.0
vcpus: 8
vga: virtio
virtio0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-2,backup=0,iothread=1,size=20G
vmgenid: ec8bc8c2-1279-43d2-bfca-9d7b9081c97f

kus

  • hugepages: 1024 - tähistab asjaolu, et kasutatakse 1G suurusi hugepage'sid; ja neid kulub nii palju kui palju on virtuaalsel arvutil mälu

PVE konstrueerib sellise qm.conf alusel sellise argumentide komplekti kvm protsessile

# ps aux | grep kvm

root@valgustaja1:~/20240715# ps aux | grep kvm | grep ^root
root       73494 72.6  0.0 3422428 75052 ?       Sl   03:33   0:42 /usr/bin/kvm -id 1113 -name imre-ubuntu-2024-12,debug-threads=on -no-shutdown \
  -chardev socket,id=qmp,path=/var/run/qemu-server/1113.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 \
  -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/1113.pid -daemonize -smbios type=1,uuid=3798a902-e278-4d1f-b6f3-0b045fb75b7a \
  -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd \
  -drive if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/vg_data/vm-1113-disk-0,size=540672 -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults \
  -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/1113.vnc,password=on -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 2048 \
  -object memory-backend-file,id=ram-node0,size=2048M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \
  -object iothread,id=iothread-virtio0 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device vmgenid,guid=0e5207ad-1339-46f1-8ac6-f4eb20a535f1 \
  -device usb-tablet,id=tablet,bus=ehci.0,port=1 -chardev socket,id=serial0,path=/var/run/qemu-server/1113.serial0,server=on,wait=off -device isa-serial,chardev=serial0 \
  -chardev socket,id=tpmchar,path=/var/run/qemu-server/1113.swtpm -tpmdev emulator,id=tpmdev,chardev=tpmchar -device tpm-tis,tpmdev=tpmdev -device virtio-vga,id=vga,bus=pcie.0,addr=0x1 \
  -chardev socket,path=/var/run/qemu-server/1113.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
  -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0,max-bytes=1024,period=1000,bus=pci.1,addr=0x1d -device virtio-serial,id=spice,bus=pci.0,addr=0x9 \
  -chardev spicevmc,id=vdagent,name=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -spice tls-port=61000,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on \
  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4aa68f351c31 \
  -drive if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 \
  -drive file=/dev/vg_data/vm-1113-disk-2,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on \
  -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100 -machine type=q35+pve0 \
  -machine q35+pve0,kernel_irqchip=split -device intel-iommu,intremap=on,caching-mode=on -chardev socket,id=char2,path=/var/run/vhostuserclient/vhost-user-client-2,server=on \
  -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce=on,queues=8 -device virtio-net-pci,mac=12:4A:8D:1E:33:3E,netdev=mynet2,mq=on,vectors=18,rx_queue_size=1024,tx_queue_size=256

kus

  • TODO
  • 'hugepages: xxx' direktiiv konstrueerib sellise lõigu, '-object memory-backend-file,id=ram-node0,size=2048M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes'

Teise arvuti puhul

TODO

OVS jaoks täiendavate arvutusressursside andmiseks sobib öelda

valgustaja1# ovs-vsctl set Interface dpdk-p0 "options:n_rxq=8"
valgustaja1# ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x0ff0

PVE host ettevalmistamine kompaktse skripti abil

root@valgustaja1:~/20240715# cat vf-setup-vvu.sh 
route delete default
ifconfig enp129s0f0np0 0

# echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

ovs-vsctl set Open_vSwitch . "other_config:dpdk-socket-mem=4096"
ovs-vsctl set Open_vSwitch . "other_config:pmd-cpu-mask=0x100000100000"
ovs-vsctl set Open_vSwitch . "other_config:vhost-iommu-support=true"
ovs-vsctl set Open_vSwitch . "other_config:userspace-tso-enable=false"
ovs-vsctl set Open_vSwitch . "other_config:dpdk-lcore-mask=0x1"
ovs-vsctl set Open_vSwitch . "other_config:dpdk-init=true"

modprobe vfio-pci
modprobe vfio

dpdk-devbind.py --bind=vfio-pci 0000:81:00.0
sleep 2
systemctl restart openvswitch-switch
sleep 2
ovs-vsctl add-br vmbr0 -- set bridge vmbr0 datapath_type=netdev
ovs-vsctl add-port vmbr0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:81:00.0
ovs-vsctl add-port vmbr0 inter -- set interface inter type=internal

ifconfig inter 10.40.134.18/29
route add default gw 10.40.134.17

mkdir /var/run/vhostuserclient
ovs-vsctl add-port vmbr0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-1"
ovs-vsctl add-port vmbr0 vhost-user-2 -- set Interface vhost-user-2 type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-2"
ovs-vsctl set port vhost-user-1 tag=3564
ovs-vsctl set port vhost-user-2 tag=3564

ovs-vsctl set Interface dpdk-p0 "options:n_rxq=8"
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x0ff0

dpdk-devbind.py --bind=vfio-pci 0000:81:00.1

Seejuures on PVE host käivitatud selliselt 2M ja 1G hugepage toega ja ettevalmistusega

root@valgustaja1:~/20240715# cat /proc/cmdline 
initrd=\EFI\proxmox\6.8.8-2-pve\initrd.img-6.8.8-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs default_hugepagesz=2M hugepagesz=1G hugepages=24 hugepagesz=2M hugepages=8192

kus

  • oluline on parameetrite järjekord
  • 'hugepagesz=1G hugepages=24' - tähistab asjaolu, et 1G suurusi pagesid on kasutusel 24 tükki (st 24 G)
  • 'hugepagesz=2M hugepages=8192' - tähistab asjaolu, et 2M suurusi pagesid on kasutusel 8192 tükki (st 16 G)
  • tuuma käsurealt on eelistatuim viis hugepage kasutus lähtestada, sest siis ei ole mälu fragmenteerunud (erinevalt operatsioonisüsteemi käigus tehtud hugepages kasutuse seadistamisele, nt sysctl väärtustega toimetamisest)

Tulemusena on PVE host peal (/run/hugepages/kvm/2048kB ja /run/hugepages/kvm/1048576kB tekivad automaatselt peale 'hugepage: xxx' virtuaalse arvuti esmakordset käivitamist)

root@valgustaja1:~/20240715# hugeadm --explain
Total System Memory: 192906 MB

Mount Point                  Options
/dev/hugepages               rw,relatime,pagesize=2M
/run/hugepages/kvm/2048kB    rw,relatime,pagesize=2M
/run/hugepages/kvm/1048576kB rw,relatime,pagesize=1024M

Huge page pools:
      Size  Minimum  Current  Maximum  Default
   2097152     8192     8192     8192        *
1073741824       24       24       24         

ning (tundub, et esitab default hugepage suuruse kohta andmeid peamiselt)

root@valgustaja1:~/20240715# grep -i huge /proc/meminfo 
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    8192
HugePages_Free:     8192
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        41943040 kB

ning parasjagu kasutus ühe ja teise suurusega huge pagenduse osas

root@valgustaja1:~/20240715# cat /sys/kernel/mm/hugepages/hugepages-2048kB/free_hugepages 
8192

root@valgustaja1:~/20240715# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages 
20

Proxmox PVE host võrgu seadistamine /etc/network/interfaces faili abil st traditsiooniliselt, esmalt tuleb füüsilised dpdk protokolliga tegelevad seadmed käivitada vfio osakonnas, muudatus on persistent ja registreeritakse kataloogis 'root@valgustaja1:~# cat /etc/driverctl.d', ja kehtestub nt reboodil ka)

root@valgustaja1:~# driverctl set-override 0000:81:00.0 vfio-pci
root@valgustaja1:~# driverctl set-override 0000:81:00.1 vfio-pci
root@valgustaja1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_options datapath_type=netdev
        ovs_extra set Open_vSwitch . other_config:dpdk-socket-mem=16384 other_config:pmd-cpu-mask=0xfff0 other_config:vhost-iommu-support=true other_config:userspace-tso-enable=false other_config:dpdk-lcore-mask=0x1 other_config:dpdk-init=true
        pre-up mkdir /var/run/vhostuserclient

auto dpdk-p0
iface dpdk-p0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_extra set Interface ${IFACE} type=dpdk options:dpdk-devargs=0000:81:00.0 options:n_rxq=12

auto dpdk-p1
iface dpdk-p1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_extra set Interface ${IFACE} type=dpdk options:dpdk-devargs=0000:81:00.1 options:n_rxq=12

auto inter
iface inter inet static
        address 10.40.134.18/29
        gateway 10.40.134.17
        ovs_type OVSIntPort
        ovs_bridge vmbr0

auto vhost-user-1 
iface vhost-user-1 inet manual
        ovs_type OVSPort
        ovs_options tag=3564
        ovs_extra set Interface ${IFACE} type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-1"
        ovs_bridge vmbr0

auto vhost-user-2
iface vhost-user-2 inet manual
        ovs_type OVSPort
        ovs_options tag=3564
        ovs_extra set Interface ${IFACE} type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-2"
        ovs_bridge vmbr0

auto vhost-user-3
iface vhost-user-3 inet manual
        ovs_type OVSPort
        ovs_options tag=3564
        ovs_extra set Interface ${IFACE} type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-3"
        ovs_bridge vmbr0

auto vhost-user-4
iface vhost-user-4 inet manual
        ovs_type OVSPort
        ovs_options tag=3564
        ovs_extra set Interface ${IFACE} type=dpdkvhostuserclient "options:vhost-server-path=/var/run/vhostuserclient/vhost-user-client-4"
        ovs_bridge vmbr0

dpdk abil liikluse kohale toomine ovs-dpdk switchi kaudu qemu virtuaalse arvuti juurde - vf representor

MÄRKUS: tundub, et 2024 aasta suvel ei saa sellist kombinatsiooni kasutada kuna E810 kaardi puhul on vastuolulised eeldused

  • VF kasutamiseks peab kernelis olema midagi nö võrgukaardile iseloomulikumat kui vfio-pci driver (E810 puhul ice tuuma moodul)
  • DPDK kasutamiseks peab E810 puhul olema kernelis vfio-pci draiver
  • nt Mellanox kaardi puhul sellist vastuolu ei teki

OVS paigaldamine ja kasutusele võtmine

valgustaja1# apt-get install openvswitch-switch-dpdk
valgustaja1:~# update-alternatives --get-selections
valgustaja1:~# update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk
valgustaja1:~# update-alternatives --get-selections

Hugepages sisselülitamine

valgustaja1:~# echo 6144 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

OVS üldiste seadistuste tegemine

# ovs-vsctl set Open_vSwitch . "other_config:dpdk-init=true"
# ovs-vsctl set Open_vSwitch . "other_config:dpdk-lcore-mask=0x1"
# ovs-vsctl set Open_vSwitch . "other_config:dpdk-socket-mem=1024"
# ovs-vsctl set Open_vSwitch . "other_config:pmd-cpu-mask=0x100000100000"

Kasulikud lisamaterjalid

liikluse kohale toomine ovs-tava switchi kaudu qemu virtuaalse arvuti juurde - vf representor

Esmalt veendutakse, et E810 füüsiline võrgukaart on olemas; ja eswitch legacy režiimis

root@valgustaja1:~# lspci | grep -i ether
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)

root@valgustaja1:~# devlink dev eswitch show pci/0000:81:00.0
pci/0000:81:00.0: mode legacy
root@valgustaja1:~# devlink dev eswitch show pci/0000:81:00.1
pci/0000:81:00.1: mode legacy

firmware

root@valgustaja1:~# devlink dev info
pci/0000:81:00.0:
  driver ice
  serial_number 00-01-00-ff-ff-00-00-00
  versions:
      fixed:
        board.id K58132-000
      running:
        fw.mgmt 7.2.4
        fw.mgmt.api 1.7.10
        fw.mgmt.build 0xe49dbde8
        fw.undi 1.3346.0
        fw.psid.api 4.20
        fw.bundle_id 0x800177ba
        fw.app.name ICE OS Default Package
        fw.app 1.3.36.0
        fw.app.bundle_id 0xc0000001
        fw.netlist 2.40.5000-2.f.0
        fw.netlist.build 0x85001ebf
      stored:
        fw.undi 1.3346.0
        fw.psid.api 4.20
        fw.bundle_id 0x800177ba
        fw.netlist 2.40.5000-2.f.0
        fw.netlist.build 0x85001ebf

driver

root@valgustaja1:~# modinfo ice
filename:       /lib/modules/6.8.4-3-pve/kernel/drivers/net/ethernet/intel/ice/ice.ko
firmware:       intel/ice/ddp/ice.pkg
license:        GPL v2
description:    Intel(R) Ethernet Connection E800 Series Linux Driver
author:         Intel Corporation, <linux.nics@intel.com>
srcversion:     6D61FBEFCC01E8EE4076596
..

root@valgustaja1:~# ethtool -i enp129s0f0np0
driver: ice
version: 6.8.4-3-pve
firmware-version: 4.20 0x800177ba 1.3346.0
expansion-rom-version: 
bus-info: 0000:81:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ning ovs switch on sisuhalduse mõttes tühi

# ovs-vsctl show
3b64530c-2083-4b56-8d1d-71432d70088d
ovs_version: "3.1.0"

Moodustatakse ovs switch nö sisuhalduse mõttes

ovs-vsctl add-br vmbr0

Lülitatakse E810 füüsilise kaardi esimene füüsiline port legacy -> switchdev režiimi

devlink dev eswitch set pci/0000:81:00.0 mode switchdev

Moodustatakse virtuaalse funktsioonid, on näha kahe VF seadme lisandumine, ning representor pordid eth0 ja eth1 ning veel nö midagi, enp129s0f0v0 ja enp129s0f0v1 (kopi-paste mac'id on võlts :)

echo 2 > /sys/class/net/enp129s0f0np0/device/sriov_numvfs

root@valgustaja1:~# lspci | grep -i ether
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
81:01.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)
81:01.1 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)

root@valgustaja1:~# ip link show
..

2: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:e0:ef:24 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 2e:38:4d:3d:61:ed brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
    vf 1     link/ether 2e:38:4d:3d:61:ee brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off

3: enp129s0f1np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:e0:ef:25 brd ff:ff:ff:ff:ff:ff

7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 2e:93:9e:ba:48:db brd ff:ff:ff:ff:ff:ff
    altname enp129s0f0npf0vf0

8: enp129s0f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:38:4d:3d:61:ed brd ff:ff:ff:ff:ff:ff

9: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether b6:4b:cc:31:f8:a5 brd ff:ff:ff:ff:ff:ff
    altname enp129s0f0npf0vf1

10: enp129s0f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:38:4d:3d:61:ee brd ff:ff:ff:ff:ff:ff
...

Võrguliidestega seotud draiverid

root@valgustaja1:~# driverctl list-devices network
0000:81:00.0 ice
0000:81:00.1 ice
0000:81:01.0 iavf
0000:81:01.1 iavf

kus

  • füüsiliste seadmetega (nn PF ehk physical functions) on seotud ice driver
  • VF seadmetega on seotud spetsiaalne intel kaardi jaoks olev iavf draiver (kui virtuaalne arvuti käivitatakse siis asendatakse see automaatselt pce-vfio draiveri vastu) - ja 'enp129s0f0v0' nimeline seade kaob ifconfig, ip link show jms käskude väljundist ära

Omistatakse mac aadressid VF jaoks

root@valgustaja1:~# ip link set enp129s0f0np0 vf 0 mac 2e:38:4d:3d:61:ed
root@valgustaja1:~# ip link set enp129s0f0np0 vf 1 mac 2e:38:4d:3d:61:ee

root@valgustaja1:~# ip link show dev enp129s0f0np0
2: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 3c:ec:ef:e0:ef:24 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 2e:38:4d:3d:61:ed brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off
    vf 1     link/ether 2e:38:4d:3d:61:ee brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state enable, trust off

Lülitakse hw offload

root@valgustaja1:~# ethtool -K eth0 hw-tc-offload on
root@valgustaja1:~# ethtool -K eth1 hw-tc-offload on

root@valgustaja1:~# ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
root@valgustaja1:~# ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_sw
root@valgustaja1:~# systemctl restart openvswitch-switch
.. No such timeout policy "ovs_test_tp"
...

Lisatakse representor pordid ovs switchi külge, ja käivitatakse

root@valgustaja1:~# ovs-vsctl add-port vmbr0 eth0
root@valgustaja1:~# ovs-vsctl add-port vmbr0 eth1
root@valgustaja1:~# ip link set eth0 up
root@valgustaja1:~# ip link set eth1 up

Tulemusena on moodustunud selline OVS üldine seadistus

root@valgustaja1:~/20240713# ovs-vsctl list open_vSwitch
_uuid               : 3b64530c-2083-4b56-8d1d-71432d70088d
bridges             : [35eebef7-6be8-444a-bf5f-ae2089c3add4]
cur_cfg             : 13
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.3.1"
dpdk_initialized    : false
dpdk_version        : "DPDK 22.11.5"
external_ids        : {hostname=valgustaja1.moraal.ee, rundir="/var/run/openvswitch", system-id="257085fe-47b8-4b5d-b894-3063fbb645e2"}
iface_types         : [afxdp, afxdp-nonpmd, bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 13
other_config        : {hw-offload="true", tc-policy=skip_sw}
ovs_version         : "3.1.0"
ssl                 : []
statistics          : {}
system_type         : debian
system_version      : "12"

ja selline OVS sisuhaldus

root@valgustaja1:~/20240713# ovs-vsctl show
3b64530c-2083-4b56-8d1d-71432d70088d
    Bridge vmbr0
        Port eth0
            Interface eth0
        Port vmbr0
            Interface vmbr0
                type: internal
        Port eth1
            Interface eth1
    ovs_version: "3.1.0"

Füüsilise kaardi lisamine ovs külge

root@valgustaja1:~# ovs-vsctl add-port vmbr0 enp129s0f0np0
root@valgustaja1:~# ip link set enp129s0f0np0 up
root@valgustaja1:~# ethtool -K enp129s0f0np0 hw-tc-offload on

Internal liidese lisamine host'i (selleks, et eemalt saaks pve hostile ligi)

valgustaja1# ovs-vsctl add-port vmbr0 inter -- set interface inter type=internal
valgustaja1# ifconfig inter 10.40.134.18/29
valgustaja1# route add default gw 10.40.134.17

port representorite ja co kaudu suheldakse virtuaalsete arvutitega, asjakohane on vlan seadistamine

valgustaja1# ovs-vsctl set port eth0 tag=3564
valgustaja1# ovs-vsctl set port eth1 tag=3564

Tulemusena

root@valgustaja1:~/20240713# ovs-vsctl show
3b64530c-2083-4b56-8d1d-71432d70088d
    Bridge vmbr0
        Port eth0
            tag: 3564
            Interface eth0
        Port inter
            Interface inter
                type: internal
        Port enp129s0f0np0
            Interface enp129s0f0np0
        Port vmbr0
            Interface vmbr0
                type: internal
        Port eth1
            tag: 3564
            Interface eth1
    ovs_version: "3.1.0"

Virtuaalse arvuti jaoks sobib kasutada sellist pve seadistusfaili

root@valgustaja1:~# cat /etc/pve/qemu-server/1102.conf 
agent: 1
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 8
cpu: host
efidisk0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=9.0.0,ctime=1720472955
name: imre-ubuntu-2404-01
ostype: l26
rng0: source=/dev/urandom
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=eff2a3c6-e3e6-4ccf-889b-33babc49c3fc
sockets: 1
tpmstate0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-1,size=4M,version=v2.0
vga: virtio
virtio0: local-to-valgustaja-1-lvm-over-mdadm:vm-1102-disk-2,backup=0,iothread=1,size=20G
vmgenid: ec8bc8c2-1279-43d2-bfca-9d7b9081c97f
hostpci0: 0000:81:01.0,pcie=1,rombar=0

ja teisel virtuaalsel arvutil analoogselt kuid hostpci0 rida on

hostpci0: 0000:81:01.1,pcie=1,rombar=0

Virtuaalses arvutis paistab võrk

root@imre-ubuntu-2024-dpdk-01:~# ifconfig enp1s0
enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.40.135.66  netmask 255.255.255.240  broadcast 10.40.135.79
        inet6 fe80::2c38:4dff:fe3d:61ed  prefixlen 64  scopeid 0x20<link>
        ether 2e:38:4d:3d:61:ed  txqueuelen 1000  (Ethernet)
        RX packets 753563383  bytes 1076089573005 (1.0 TB)
        RX errors 0  dropped 85139  overruns 0  frame 0
        TX packets 50662841  bytes 3084593575 (3.0 GB)
        TX errors 138  dropped 0 overruns 0  carrier 0  collisions 0

kus

  • mac aadress vastab eelpool seadistatud mac aadressile

Võrguliidestega seotud draiverite kasutus on selline, PVE host peal

root@valgustaja1:~# driverctl list-devices network
0000:81:00.0 ice
0000:81:00.1 ice
0000:81:01.0 vfio-pci
0000:81:01.1 vfio-pci

virtuaalses arvutis

root@imre-ubuntu-2024-dpdk-01:~# lspci | grep -i ethern
01:00.0 Ethernet controller: Intel Corporation Ethernet Adaptive Virtual Function (rev 02)

root@imre-ubuntu-2024-dpdk-01:~# driverctl list-devices network
0000:01:00.0 iavf

root@imre-ubuntu-2024-dpdk-01:~# ethtool -i enp1s0
driver: iavf
version: 6.8.0-38-generic
firmware-version: N/A
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

root@imre-ubuntu-2024-dpdk-01:~# ethtool enp1s0
Settings for enp1s0:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 25000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: None
        PHYAD: 0
        Transceiver: internal
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

Kui virtuaalse arvuti töö lõpetada jääb PVE host peal VF draiveriks siiski vfio-pci, seda saab muuta nt nii, käsu tagajärejel ilmub ifconfig võrguliideste nimekirja tagasi 'enp129s0f0v0' nimeline seade

root@valgustaja1:~# dpdk-devbind.py --bind=iavf 0000:81:01.0

Jõudlus - liiklus kahe virtuaalse arvuti vahel, kiirus läheb alla 15 Gbit/s juures, aga võib olla seda teeb iperf3 ise reaktsioonina 'Retr' väärtusele

root@imre-ubuntu-2024-dpdk-02:~# iperf3 -c 10.40.135.66 -t 300
Connecting to host 10.40.135.66, port 5201
[  5] local 10.40.135.67 port 59480 connected to 10.40.135.66 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.93 GBytes  25.1 Gbits/sec  418    765 KBytes       
[  5]   1.00-2.00   sec  2.92 GBytes  25.1 Gbits/sec  413    690 KBytes       
[  5]   2.00-3.00   sec  2.93 GBytes  25.1 Gbits/sec  441    865 KBytes       
[  5]   3.00-4.00   sec  2.92 GBytes  25.1 Gbits/sec  511    710 KBytes       
[  5]   4.00-5.00   sec  2.34 GBytes  20.1 Gbits/sec    0    932 KBytes       
[  5]   5.00-6.00   sec  2.28 GBytes  19.6 Gbits/sec    0    932 KBytes       
[  5]   6.00-7.00   sec  2.09 GBytes  17.9 Gbits/sec    0    932 KBytes       
[  5]   7.00-8.00   sec  2.01 GBytes  17.3 Gbits/sec    0    932 KBytes       
[  5]   8.00-9.00   sec  1.63 GBytes  14.0 Gbits/sec   64    704 KBytes       
[  5]   9.00-10.00  sec  1.79 GBytes  15.4 Gbits/sec    0    704 KBytes       
[  5]  10.00-11.00  sec  1.76 GBytes  15.1 Gbits/sec    1    495 KBytes       
[  5]  11.00-12.00  sec  1.78 GBytes  15.3 Gbits/sec   12    386 KBytes       
[  5]  12.00-13.00  sec  1.79 GBytes  15.4 Gbits/sec    6    437 KBytes       
[  5]  13.00-14.00  sec  1.78 GBytes  15.3 Gbits/sec    0    481 KBytes       
[  5]  14.00-15.00  sec  1.79 GBytes  15.4 Gbits/sec    0    482 KBytes       
[  5]  15.00-16.00  sec  1.79 GBytes  15.4 Gbits/sec    0    492 KBytes       
[  5]  16.00-17.00  sec  1.79 GBytes  15.3 Gbits/sec    3    389 KBytes       
[  5]  17.00-18.00  sec  1.79 GBytes  15.3 Gbits/sec    1    474 KBytes       
[  5]  18.00-19.00  sec  1.79 GBytes  15.4 Gbits/sec    0    479 KBytes       
[  5]  19.00-20.00  sec  1.79 GBytes  15.3 Gbits/sec    0    479 KBytes     

Jõudlus - hping3 abil synflood, kohale jõuab nii 600 kpps, pigem on see mõõdukas või isegi tagasihoidlik tulemus

root@imre-ubuntu-2024-dpdk-02:~# cat run-hping3-to-66.sh 
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &

root@imre-ubuntu-2024-dpdk-02:~# vnstat -l -i enp1s0
Monitoring enp1s0...    (press CTRL-C to stop)

-      rx:  329.30 Mbit/s 686026 p/s       tx:  329.30 Mbit/s 686025 p/s^C


root@imre-ubuntu-2024-dpdk-01:~# vnstat -l -i enp1s0
Monitoring enp1s0...    (press CTRL-C to stop)

/      rx:  331.81 Mbit/s 691272 p/s       tx:  331.81 Mbit/s 691266 p/s^C

Jõudlus - eemalt üle nö 20 Gbit/s võimelise ühenduse (vähemalt) virtuaalsesse arvutisse - tagasihoidlik

root@gen-1:~# timeout 120 iperf3 -c 10.40.135.66 -t 60
Connecting to host 10.40.135.66, port 5201
[  5] local 10.40.13.246 port 38952 connected to 10.40.135.66 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   864 MBytes  7.25 Gbits/sec   94   2.45 MBytes       
[  5]   1.00-2.00   sec   846 MBytes  7.10 Gbits/sec    0   2.70 MBytes       
[  5]   2.00-3.00   sec   931 MBytes  7.81 Gbits/sec    0   2.94 MBytes       
[  5]   3.00-4.00   sec   892 MBytes  7.49 Gbits/sec    0   3.15 MBytes       
[  5]   4.00-5.00   sec   914 MBytes  7.67 Gbits/sec    0   3.36 MBytes       
[  5]   5.00-6.00   sec   946 MBytes  7.94 Gbits/sec    0   3.55 MBytes       
[  5]   6.00-7.00   sec   912 MBytes  7.66 Gbits/sec    0   3.64 MBytes       
[  5]   7.00-8.00   sec   942 MBytes  7.91 Gbits/sec  371   2.75 MBytes       
[  5]   7.00-8.00   sec   942 MBytes  7.91 Gbits/sec  371   2.75 MBytes  
...

Eemalt PVE hosti, natuke nö loperdab, aga pigem ilus

root@gen-1:~# timeout 120 iperf3 -c 10.40.134.18 -t 60
Connecting to host 10.40.134.18, port 5201
[  5] local 10.40.13.246 port 40108 connected to 10.40.134.18 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.74 GBytes  23.5 Gbits/sec    0   2.05 MBytes       
[  5]   1.00-2.00   sec  2.70 GBytes  23.2 Gbits/sec    0   2.39 MBytes       
[  5]   2.00-3.00   sec  2.72 GBytes  23.4 Gbits/sec    0   2.59 MBytes       
[  5]   3.00-4.00   sec  2.71 GBytes  23.3 Gbits/sec    0   2.59 MBytes       
[  5]   4.00-5.00   sec  2.58 GBytes  22.1 Gbits/sec   90   1.81 MBytes       
[  5]   5.00-6.00   sec  2.62 GBytes  22.5 Gbits/sec    0   2.02 MBytes       
[  5]   6.00-7.00   sec  2.53 GBytes  21.7 Gbits/sec    0   2.20 MBytes       
[  5]   7.00-8.00   sec  2.74 GBytes  23.5 Gbits/sec    0   2.20 MBytes       
[  5]   8.00-9.00   sec  2.65 GBytes  22.8 Gbits/sec    0   2.51 MBytes       
[  5]   9.00-10.00  sec  2.58 GBytes  22.1 Gbits/sec    0   2.51 MBytes       
[  5]  10.00-11.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.51 MBytes       
[  5]  11.00-12.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.51 MBytes       
[  5]  12.00-13.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.59 MBytes       
[  5]  13.00-14.00  sec  2.51 GBytes  21.6 Gbits/sec    0   2.59 MBytes       
[  5]  14.00-15.00  sec  1.91 GBytes  16.4 Gbits/sec    3   1.90 MBytes       
[  5]  15.00-16.00  sec  2.72 GBytes  23.4 Gbits/sec    0   2.09 MBytes       
[  5]  16.00-17.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.46 MBytes       
[  5]  17.00-18.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.46 MBytes       
[  5]  18.00-19.00  sec  2.73 GBytes  23.4 Gbits/sec    0   2.46 MBytes       
[  5]  19.00-20.00  sec  2.62 GBytes  22.5 Gbits/sec    0   2.46 MBytes       
[  5]  20.00-21.00  sec  2.66 GBytes  22.8 Gbits/sec    0   2.60 MBytes       
[  5]  21.00-22.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  22.00-23.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  23.00-24.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  24.00-25.00  sec  1.91 GBytes  16.4 Gbits/sec    0   2.60 MBytes       
[  5]  25.00-26.00  sec  1.72 GBytes  14.8 Gbits/sec    0   2.60 MBytes       
[  5]  26.00-27.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  27.00-28.00  sec  2.73 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  28.00-29.00  sec  2.74 GBytes  23.5 Gbits/sec    0   2.60 MBytes       
[  5]  29.00-30.00  sec  2.69 GBytes  23.1 Gbits/sec    0   2.60 MBytes       
[  5]  30.00-31.00  sec  2.72 GBytes  23.4 Gbits/sec    0   2.60 MBytes       
[  5]  31.00-32.00  sec  1.65 GBytes  14.2 Gbits/sec    3   1.82 MBytes       
[  5]  32.00-33.00  sec  1.65 GBytes  14.1 Gbits/sec   11   1.64 MBytes       
[  5]  33.00-34.00  sec  2.50 GBytes  21.5 Gbits/sec    0   1.87 MBytes       
[  5]  34.00-35.00  sec  2.74 GBytes  23.5 Gbits/sec    0   1.93 MBytes       
[  5]  35.00-36.00  sec  2.73 GBytes  23.4 Gbits/sec    0   1.95 MBytes    
..  

Offload, teema vajab uurimist, aga midagi nagu toimub

root@valgustaja1:~# ovs-appctl dpctl/dump-flows -m | grep offloaded
ufid:047fcb94-4f5a-4fd1-9daa-7bb656e0def7, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp129s0f0np0),packet_type(ns=0/0,id=0/0),eth(src=c0:8b:2a:86:40:b0,dst=01:80:c2:00:00:0e),eth_type(0x88cc), packets:0, bytes:0, used:never, offloaded:yes, dp:tc, actions:drop

Jõudlus - pve host ja pve guest vaheline liiklus

root@valgustaja1:~/20240713# iperf3 -c 10.40.135.67 -t 60
Connecting to host 10.40.135.67, port 5201
[  5] local 10.40.135.69 port 41318 connected to 10.40.135.67 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1007 MBytes  8.45 Gbits/sec    0   1.58 MBytes       
[  5]   1.00-2.00   sec  1004 MBytes  8.42 Gbits/sec    0   1.58 MBytes       
[  5]   2.00-3.00   sec  1.03 GBytes  8.84 Gbits/sec    0   1.66 MBytes       
[  5]   3.00-4.00   sec  1.05 GBytes  8.99 Gbits/sec    0   1.66 MBytes       
[  5]   4.00-5.00   sec  1.04 GBytes  8.92 Gbits/sec    0   1.66 MBytes       
[  5]   5.00-6.00   sec  1.01 GBytes  8.67 Gbits/sec    0   1.74 MBytes       
[  5]   6.00-7.00   sec   869 MBytes  7.29 Gbits/sec    0   1.83 MBytes
...

kus

  • tundub, et liikluse piiriks kujunab 8 Gbit/s ja sellel on mingit sorti nö opsüsteemi vms infrastruktuurne põhjus - üks cpu on 'si' maitselise load poolt koormatud
  • üle võrgu virtuaalse arvuti poole pöördudes on ka kiiruse piiriks ca 7 - 8 Gbit/s
  • võib olla saaks seda kohendada mtu muutmisega, aga idee poolest võiks ta ka 1500 Bait mtu puhul olla 20 Gbit/s vms

Setup tegevus kompaksemas esitused skripti kujul ('ifconfig enp129s0f1np1 0' käsk eemaldab /etc/network/interfaces ajutiselt tehtud võrguseadistuse)

root@valgustaja1:~# cat  20240714/vf-setup.sh 
ovs-vsctl add-br vmbr0
devlink dev eswitch set pci/0000:81:00.0 mode switchdev
echo 2 > /sys/class/net/enp129s0f0np0/device/sriov_numvfs
sleep 2
ip link set enp129s0f0np0 vf 0 mac 2e:38:4d:3d:61:ed
ip link set enp129s0f0np0 vf 1 mac 2e:38:4d:3d:61:ee

ethtool -K eth0 hw-tc-offload on
ethtool -K eth1 hw-tc-offload on

ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_sw
systemctl restart openvswitch-switch

ovs-vsctl add-port vmbr0 eth0
ovs-vsctl add-port vmbr0 eth1
ip link set eth0 up
ip link set eth1 up

ovs-vsctl add-port vmbr0 enp129s0f0np0
ip link set enp129s0f0np0 up
ethtool -K enp129s0f0np0 hw-tc-offload on

ovs-vsctl add-port vmbr0 inter -- set interface inter type=internal

ifconfig enp129s0f1np1 0

ifconfig inter 10.40.134.18/29
route add default gw 10.40.134.17

ovs-vsctl set port eth0 tag=3564
ovs-vsctl set port eth1 tag=3564

Kasulikud lisamaterjalid

liikluse kohale toomine ovs-tava switchi kaudu qemu virtuaalse arvuti juurde - vf representor - PROBLEEMID

2024 suvel on viimane Proxmox PVE v. 8.2 ja tema kooseisus tuum v. 6.8.8 - selle tuuma kasutamisel üldiselt VF osakond annab nö kerneli vea

uname -a
TODO

Viga tekib virtuaalse arvuti käivitamisel, täpsemalt siis kui virtuaalne arvuti hakkab seadistama võrguliidesele ip aadress

valgustaja# tail -f /var/log/syslog
..
2024-07-14T01:23:00.961287+03:00 valgustaja1 qm[3685]: start VM 1103: UPID:valgustaja1:00000E65:00004DE6:6692FE44:qmstart:1103:root@pam:
2024-07-14T01:23:00.961533+03:00 valgustaja1 qm[3684]: <root@pam> starting task UPID:valgustaja1:00000E65:00004DE6:6692FE44:qmstart:1103:root@pam:
2024-07-14T01:23:01.023656+03:00 valgustaja1 kernel: [  199.494355] VFIO - User Level meta-driver version: 0.3
2024-07-14T01:23:01.054647+03:00 valgustaja1 kernel: [  199.525437] iavf 0000:81:01.1: Removing device
2024-07-14T01:23:02.393086+03:00 valgustaja1 systemd[1]: Created slice qemu.slice - Slice /qemu.
2024-07-14T01:23:02.398644+03:00 valgustaja1 systemd[1]: Started 1103.scope.
2024-07-14T01:23:02.644841+03:00 valgustaja1 kernel: [  201.115657] vfio-pci 0000:81:01.1: enabling device (0000 -> 0002)
2024-07-14T01:23:02.942508+03:00 valgustaja1 qm[3684]: <root@pam> end task UPID:valgustaja1:00000E65:00004DE6:6692FE44:qmstart:1103:root@pam: OK
2024-07-14T01:23:19.392912+03:00 valgustaja1 kernel: [  216.848253] BUG: kernel NULL pointer dereference, address: 000000000000003f
2024-07-14T01:23:19.392924+03:00 valgustaja1 kernel: [  216.848546] #PF: supervisor read access in kernel mode
2024-07-14T01:23:19.392925+03:00 valgustaja1 kernel: [  216.848775] #PF: error_code(0x0000) - not-present page
2024-07-14T01:23:19.392925+03:00 valgustaja1 kernel: [  216.848997] PGD 0 
2024-07-14T01:23:19.392926+03:00 valgustaja1 kernel: [  216.849215] Oops: 0000 [#1] PREEMPT SMP NOPTI
2024-07-14T01:23:19.392927+03:00 valgustaja1 kernel: [  216.849432] CPU: 31 PID: 3369 Comm: kworker/31:3 Tainted: P           O       6.8.8-2-pve #1
2024-07-14T01:23:19.392927+03:00 valgustaja1 kernel: [  216.849651] Hardware name: Supermicro AS -1115CS-TNR/H13SSW, BIOS 1.6 10/05/2023
2024-07-14T01:23:19.392928+03:00 valgustaja1 kernel: [  216.849871] Workqueue: ice ice_service_task [ice]
2024-07-14T01:23:19.392928+03:00 valgustaja1 kernel: [  216.850111] RIP: 0010:ice_lag_move_new_vf_nodes+0x82/0x1c0 [ice]
2024-07-14T01:23:19.392928+03:00 valgustaja1 kernel: [  216.850353] Code: 68 01 49 89 c4 0f 85 49 01 00 00 4d 8b 75 28 49 8d 86 e8 0a 00 00 4d 8b ae a0 47 00 00 48 89 c7 48 89 44 24 08 e8 1e e6 2c da <41> f6 45 3f 01 0f 84 e1 00 00 00 49 83 7d 10 00 45 0f b6 7d 3e 48
2024-07-14T01:23:19.392929+03:00 valgustaja1 kernel: [  216.850817] RSP: 0018:ff65e9e006937bc0 EFLAGS: 00010246
2024-07-14T01:23:19.392929+03:00 valgustaja1 kernel: [  216.851051] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000000
2024-07-14T01:23:19.392930+03:00 valgustaja1 kernel: [  216.851288] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
2024-07-14T01:23:19.392930+03:00 valgustaja1 kernel: [  216.851522] RBP: ff65e9e006937c18 R08: 0000000000000000 R09: 0000000000000000
2024-07-14T01:23:19.392931+03:00 valgustaja1 kernel: [  216.851754] R10: 0000000000000000 R11: 0000000000000000 R12: ff3e7213cd0af828
2024-07-14T01:23:19.392931+03:00 valgustaja1 kernel: [  216.851984] R13: 0000000000000000 R14: ff3e7212f9ad81a0 R15: 0000000000000000
2024-07-14T01:23:19.392931+03:00 valgustaja1 kernel: [  216.852214] FS:  0000000000000000(0000) GS:ff3e72414c980000(0000) knlGS:0000000000000000
2024-07-14T01:23:19.392931+03:00 valgustaja1 kernel: [  216.852449] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-07-14T01:23:19.392932+03:00 valgustaja1 kernel: [  216.852681] CR2: 000000000000003f CR3: 000000103a636004 CR4: 0000000000f71ef0
2024-07-14T01:23:19.392932+03:00 valgustaja1 kernel: [  216.852917] PKRU: 55555554
2024-07-14T01:23:19.392932+03:00 valgustaja1 kernel: [  216.853148] Call Trace:
2024-07-14T01:23:19.392933+03:00 valgustaja1 kernel: [  216.853378]  <TASK>
2024-07-14T01:23:19.392933+03:00 valgustaja1 kernel: [  216.853608]  ? show_regs+0x6d/0x80
2024-07-14T01:23:19.392933+03:00 valgustaja1 kernel: [  216.853840]  ? __die+0x24/0x80
2024-07-14T01:23:19.392934+03:00 valgustaja1 kernel: [  216.854067]  ? page_fault_oops+0x176/0x500
2024-07-14T01:23:19.392934+03:00 valgustaja1 kernel: [  216.854293]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-14T01:23:19.392934+03:00 valgustaja1 kernel: [  216.854520]  ? ice_sched_find_node_by_teid+0x71/0xb0 [ice]
2024-07-14T01:23:19.392935+03:00 valgustaja1 kernel: [  216.854769]  ? do_user_addr_fault+0x2f9/0x6b0
2024-07-14T01:23:19.392935+03:00 valgustaja1 kernel: [  216.854995]  ? exc_page_fault+0x83/0x1b0
2024-07-14T01:23:19.392935+03:00 valgustaja1 kernel: [  216.855221]  ? asm_exc_page_fault+0x27/0x30
2024-07-14T01:23:19.392936+03:00 valgustaja1 kernel: [  216.855449]  ? ice_lag_move_new_vf_nodes+0x82/0x1c0 [ice]
2024-07-14T01:23:19.392936+03:00 valgustaja1 kernel: [  216.855693]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-14T01:23:19.392936+03:00 valgustaja1 kernel: [  216.855920]  ice_vc_cfg_qs_msg+0xa3/0x680 [ice]
2024-07-14T01:23:19.392937+03:00 valgustaja1 kernel: [  216.856165]  ? ice_vc_send_msg_to_vf+0x39/0xa0 [ice]
2024-07-14T01:23:19.392937+03:00 valgustaja1 kernel: [  216.856408]  ice_vc_process_vf_msg+0x5b4/0xb60 [ice]
2024-07-14T01:23:19.392937+03:00 valgustaja1 kernel: [  216.856651]  __ice_clean_ctrlq+0x2e7/0xa90 [ice]
2024-07-14T01:23:19.392937+03:00 valgustaja1 kernel: [  216.856895]  ? srso_alias_return_thunk+0x5/0xfbef5
2024-07-14T01:23:19.392938+03:00 valgustaja1 kernel: [  216.857122]  ice_service_task+0xade/0x10b0 [ice]
2024-07-14T01:23:19.392938+03:00 valgustaja1 kernel: [  216.857363]  ? kernfs_notify_workfn+0x1dd/0x220
2024-07-14T01:23:19.392938+03:00 valgustaja1 kernel: [  216.857590]  process_one_work+0x16a/0x350
2024-07-14T01:23:19.392938+03:00 valgustaja1 kernel: [  216.857813]  worker_thread+0x306/0x440
2024-07-14T01:23:19.392939+03:00 valgustaja1 kernel: [  216.858031]  ? __pfx_worker_thread+0x10/0x10
2024-07-14T01:23:19.392939+03:00 valgustaja1 kernel: [  216.858245]  kthread+0xef/0x120
2024-07-14T01:23:19.392940+03:00 valgustaja1 kernel: [  216.858456]  ? __pfx_kthread+0x10/0x10
2024-07-14T01:23:19.392940+03:00 valgustaja1 kernel: [  216.858665]  ret_from_fork+0x44/0x70
2024-07-14T01:23:19.392940+03:00 valgustaja1 kernel: [  216.858873]  ? __pfx_kthread+0x10/0x10
2024-07-14T01:23:19.392941+03:00 valgustaja1 kernel: [  216.859080]  ret_from_fork_asm+0x1b/0x30
2024-07-14T01:23:19.392941+03:00 valgustaja1 kernel: [  216.859288]  </TASK>
2024-07-14T01:23:19.392941+03:00 valgustaja1 kernel: [  216.859487] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd cls_matchall act_gact cls_flower sch_ingress iavf nfnetlink_cttimeout ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log binfmt_misc nfnetlink rpcrdma sunrpc ipmi_ssif rdma_ucm ib_iser intel_rapl_msr intel_rapl_common libiscsi scsi_transport_iscsi amd64_edac edac_mce_amd rdma_cm iw_cm kvm_amd ib_cm kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd dax_hmem cxl_acpi rapl cxl_core pcspkr irdma i40e acpi_ipmi ib_uverbs ast ipmi_si i2c_algo_bit ib_core ipmi_devintf k10temp ccp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid
2024-07-14T01:23:19.392942+03:00 valgustaja1 kernel: [  216.859569]  hid zfs(PO) spl(O) btrfs blake2b_generic raid10 raid1 raid0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c xhci_pci crc32_pclmul ice xhci_pci_renesas nvme ahci gnss xhci_hcd nvme_core libahci i2c_piix4 nvme_auth
2024-07-14T01:23:19.392942+03:00 valgustaja1 kernel: [  216.862275] CR2: 000000000000003f
2024-07-14T01:23:19.392943+03:00 valgustaja1 kernel: [  216.862511] ---[ end trace 0000000000000000 ]---
...

Osutub, et tuum v. 6.8.4 toimib kenasti (seejuures on PVE host operatsioonisüsteem paketihalduse mõttes uuendatud tuumale 6.8.8 vastavasse seisu, st uuendatud süsteem käivitatakse vanema tuuma valikuga)

root@valgustaja1:~# uname -a
Linux valgustaja1 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

Koormustestimine - vhost user lahendus

Koormustestimise võrgujoonis

  • valgustaja1 - 10.40.134.18 - füüsiline pve node/host arvuti (25 Gbit/s)
  • ubuntu-2404-01 - 10.40.135.66 - virtuaalne arvuti
  • ubuntu-2404-02 - 10.40.135.67 - virtuaalne arvuti
  • ubuntu-2404-03 - 10.40.135.68 - virtuaalne arvuti
  • gen-1 - 10.40.13.242 - füüsiline arvuti (25 Gbit/s)
  • gen-2 - 10.40.13.246 - füüsiline arvuti (25 Gbit/s)

syn flood

Kolmes virtuaalses arvutis ei käivitata midagi spetsiaalselt, lihtsalt paketifilter ei takista vms. Syn paketile vastatakse rst paketiga.

gen-1 arvutis käivitatakse kaheksa protsessi (24 on random tühi port)

# cat run-hping3-to-66.sh 
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &

ning ka analoogne 'run-hping3-to-67.sh'.

gen-2 arvutis käivitakse analoone 'run-hping3-to-68.sh'.

vnstat abil vaadeldes paistab

  • üks hping3 protsess tekitab 200 kpps sekundis väljuvat liiklust (ootus oleks et tekib rohkem)
  • ühe hping3 protsessi tagasihoidlikku väljndit kompenseeritakse mitme hping3 protsessi samaaegse käivitamisega (üks 8 protsessi komplekt tekitab nii 1200 kpps)
  • tulemusena tekitatakse kokku gen-1 ja gen-2 peal väljuvat liiklust ca 3 x 1200 = 3.6 mpps
  • see liiklus jõuab virtuaalsetele arvutite koplekti juurde kohale 'vnstat -l -i enp0s2' abil hinnates, seejuures on virtuaalsed arvutid cpu mõttes väga koormatud

Virtuaalsete arvutite jõudlust määrab protsessorite arv (8) ning nendega peab olema kooskõlas qemu virtuaalse arvuti seadistus

cores: 8
memory: 8192
numa: 1
sockets: 1
vcpus: 8

ja ovs port seadistus eriti rxq osas, ühe arvuti puhul peab see olema 8, kolme puhul ehk 3 x 8, peab uurima

# ovs-vsctl show
...
        Port dpdk-p0
            Interface dpdk-p0
                type: dpdk
                options: {dpdk-devargs="0000:81:00.0", n_rxq="16"}

Tulemusena peavad saama kõik protsessorid virtuaalses arvutis nn interrupt load'i (vaadata top + 1 abil, ja top + H abil). Ja PVE host peal võiks ka kaheksa CPU PMD 100% koormus nihkuda 'us' pealt 'sy' peale.

iperf3

Kolmes virtuaalses arvutis on kävitatud 'iperf3 -s' serverid.

gen-1 arvutis käivitatakse

gen-1# iperf3 -t 180 -c 10.40.135.66
gen-1# iperf3 -t 180 -c 10.40.135.67

gen-2 arvutis käivitatakse

gen-2# iperf3 -t 180 -c 10.40.135.68

Taustal jälgitakse võrguliidesel vnstat abil väljuva liikluse statistikat

gen-1# vnstat -l -i enp129s0f0
gen-2# vnstat -l -i enp129s0f0

Tulemusena saavutab iga ühendus nii 7-11 Gbit/s. Samal ajal virtuaalsed iperf3 serverid ei ole os mõttes koormatud, interrupt load on seotud ühe cpu'ga (ju see on iperf3 töötamist või kasutamise eripära).

iperf

Server poole käivitamine, tundun, et nii käivitatud viisil teenindatakse ühe kui mitme-threadilisi sisse pöördumisi

# iperf -s -p 5203

Kliendi poole käivitamine

root@kiirus-25g-2:~# iperf -c 10.40.135.67 -t 120 -p 5203 -P 6
------------------------------------------------------------
Client connecting to 10.40.135.67, TCP port 5203
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 193.40.13.246 port 45316 connected with 193.40.135.67 port 5203
[  4] local 193.40.13.246 port 45340 connected with 193.40.135.67 port 5203
[  6] local 193.40.13.246 port 45364 connected with 193.40.135.67 port 5203
[  2] local 193.40.13.246 port 45322 connected with 193.40.135.67 port 5203
[  5] local 193.40.13.246 port 45352 connected with 193.40.135.67 port 5203
[  3] local 193.40.13.246 port 45326 connected with 193.40.135.67 port 5203
[ ID] Interval       Transfer     Bandwidth
[  5] 0.0000-120.0142 sec  59.5 GBytes  4.26 Gbits/sec
[  6] 0.0000-120.0141 sec  41.1 GBytes  2.94 Gbits/sec
[  4] 0.0000-120.0140 sec  66.4 GBytes  4.76 Gbits/sec
[  1] 0.0000-120.0142 sec  60.0 GBytes  4.29 Gbits/sec
[  2] 0.0000-120.0141 sec  53.3 GBytes  3.82 Gbits/sec
[  3] 0.0000-120.0142 sec  47.7 GBytes  3.41 Gbits/sec
[SUM] 0.0000-120.0028 sec   328 GBytes  23.5 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) = 0.110/0.211/0.274/0.058 ms (tot/err) = 6/0

kus

  • käivitus kuus klienti paralleelselt
  • kõik kuus lõpetasid, iga tekitas osa liiklusest, kokku hoiti üleval 23.5 Gbit/s liiklust

Serveri poolel vnstat näitas

root@imre-ubuntu-2024-dpdk-02:~# vnstat -l -i enp0s2
Monitoring enp0s2...    (press CTRL-C to stop)

\      rx:   24.55 Gbit/s 2027074 p/s       tx:   49.00 Mbit/s 92773 p/s

Samal ajal oli iperf server (kuhu pöörduti) virtuaalses arvutis normaalne käsurea kasutajakogemus st arvuti ei olnud nö hangumise piiril ja top paistis sedasi

20240717-e810-iperf-01.png

kus

  • 'si' st interrupt load on jaotunud mitme protsessori vahel (kuna iperf on threaded rakendus ja tema poole pöördub mitu nö ühendust (6)
  • top vaates ei ole arvuti ebaloomulikult koormatud ja selle arvuti käsurea kasutajakogemus on ka selline

Koormustestimine - tava lahendus

Võrgunduse setup on sama, testid on samad. Muudatus seisneb selle, et ovs töötab non-dpdk ja non-vhost-user-protokoll režiimis.

syn flood

Kolmes virtuaalses arvutis ei käivitata midagi spetsiaalselt, lihtsalt paketifilter ei takista vms. Syn paketile vastatakse rst paketiga.

gen-1 arvutis käivitatakse kaheksa protsessi (24 on random tühi port)

# cat run-hping3-to-66.sh 
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &
timeout 60 hping3 -S -c 400000000 --flood -p 24 10.40.135.66 -m 200 &

Virtuaalsete arvutite jõudlust määrab protsessorite arv (nt 8) ja oluline on sisse lülidada multi-queue, nt samuti väärtusega 8 (tundub, et tingimata need väärtused ei pea langema kokku)

cores: 8
memory: 8192
sockets: 1
..
net0: virtio=BC:24:11:93:F5:42,bridge=vmbr0,firewall=1,tag=3564,queues=8

ja ovs seadistus on nö tavaline

root@valgustaja1:~# ovs-vsctl show
3dddee67-72e2-4b2a-8e1d-9ebe6b0bbb3f
    Bridge vmbr0
        Port fwln1102o0
            tag: 3564
            Interface fwln1102o0
                type: internal
        Port enp129s0f0np0
            Interface enp129s0f0np0
        Port vmbr0
            Interface vmbr0
                type: internal
        Port fwln1103o0
            tag: 3564
            Interface fwln1103o0
                type: internal
    ovs_version: "3.1.0"

Tulemusena

  • TODO

iperf3

TODO

Tulemusena

  • üks iperf3 klient protsess saavutab ühe selliselt seadistatud iperf3 serveriga suheldes kiiruse 20 Gbit/s
  • kui multiqueue välja lülitada, siis üks iperf3 klient protsess saavutab ühe selliselt seadistatud iperf3 serveriga suheldes kiiruse 15 Gbit/s
  • pve host ega pve guest nö tava-cpu-load ei ole märkimisväärne (nö visuaalselt palju parem kui dpdk puhul, nt top programmi väljundis pressides klahv '1' ja klahv 'shift + h')

Probleemid

Probleem tekib siis kui füüsilisele võrgukaardile vastav librte teek on paigaldamata, logis on siis

valgustaja1# less /var/log/openvswitch/ovs-vswitchd.log
...
2024-07-09T00:26:12.303Z|00052|dpdk|ERR|EAL: Driver cannot attach the device (0000:81:00.0)
2024-07-09T00:26:12.303Z|00053|dpdk|ERR|EAL: Failed to attach device on primary process
2024-07-09T00:26:12.303Z|00054|netdev_dpdk|WARN|Error attaching device '0000:81:00.0' to DPDK
2024-07-09T00:26:12.303Z|00055|netdev|WARN|dpdk-p0: could not set configuration (Invalid argument)
2024-07-09T00:26:12.303Z|00056|dpdk|ERR|Invalid port_id=32

ning

root@valgustaja1# cat ovs-vsctl-show-katki 
e3c33e5f-ba41-45e8-b0af-f7310a63f5ce
    Bridge vmbr0
        Port dpdk-p0
            Interface dpdk-p0
                type: dpdk
                options: {dpdk-devargs="0000:81:00.0"}
                error: "Error attaching device '0000:81:00.0' to DPDK"
...

Mõisted

  • epct - ethernet port configuration tool

Riistvara seadistamine

arvuti BIOS

TODO

HII menüü

TODO

devlink utiliit

TODO

epct utiliit - os käsurida

epct võiks olla väga asjakohane tööriist, aga 2024 aasta kevadel ta praktiliselt eriti ei tööta, ei linux kernel v 6.x ega v. 5.x puhul, https://www.intel.com/content/www/us/en/download/19437/ethernet-port-configuration-tool-linux.html

kui ice draiver on laaditud (kasutatakse kernel native ice draiverit, analoogne tulemus on epct v1.41.03.01)

# ./epct64e -devices
Ethernet Port Configuration Tool
EPCT version: v1.40.05.05
Copyright 2019 - 2023 Intel Corporation.

Cannot initialize port: [00:129:00:00] Intel(R) Ethernet Controller E810-XXV for SFP
Cannot initialize port: [00:129:00:01] Intel(R) Ethernet Controller E810-XXV for SFP

Error: Cannot initialize adapter.

kui ice draiver ei ole laaditud

# ./epct64e -devices
Ethernet Port Configuration Tool
EPCT version: v1.40.05.05
Copyright 2019 - 2023 Intel Corporation.

Base driver not supported or not present: [00:129:00:00] Intel(R) Ethernet Controller E810-XXV for SFP
NIC Seg:Bus:Fun   Ven-Dev   Connector Ports Speed    Quads  Lanes per PF
=== ============= ========= ========= ===== ======== ====== ============
 1) 000:129:00-01 8086-159B SFP       2     -   Gbps N/A    N/A

Error: Base driver is not available for one or more adapters. Please ensure the driver is correctly attached to the device.

epct utiliit - efi rakendus

epct efi rakendus natuke töötab, aga ei võimalda siis 'svio enable' teha, https://www.intel.com/content/www/us/en/download/19440/ethernet-port-configuration-tool-efi.html

20240502-epct-e810-01.png

nvmeupdate efi rakendus iseenesest ei anna viga, aga ei saa ka aru, et süsteemis oleks e810 kaart, https://www.intel.com/content/www/us/en/download/19629/non-volatile-memory-nvm-update-utility-for-intel-ethernet-network-adapters-e810-series-efi.html

20240502-nvmeupdate-e810-02.png

Misc

Kasutada on üks füüsiline kahe pordiga võrgukaart

# lspci | grep -i net
81:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
81:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)

Operatsioonisüsteemi tavalised võrguseadmed on sellised

root@pve-02:~# ethtool enp129s0f1np1
Settings for enp129s0f1np1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                25000baseCR/Full
                                25000baseSR/Full
                                1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
                                10000baseLR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None        RS      BASER
        Advertised link modes:  25000baseCR/Full
                                10000baseCR/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None       RS      BASER
        Link partner advertised link modes:  Not reported
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 25000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes


root@pve-02:~# ethtool -i enp129s0f1np1
driver: ice
version: 6.5.13-5-pve
firmware-version: 4.40 0x8001c7d4 1.3534.0
expansion-rom-version: 
bus-info: 0000:81:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Paistab devlink vaatest sedasi

# devlink dev info
pci/0000:81:00.0:
  driver ice
  serial_number 00-01-00-ff-ff-00-00-00
  versions:
      fixed:
        board.id K58132-000
      running:
        fw.mgmt 7.4.13
        fw.mgmt.api 1.7.11
        fw.mgmt.build 0xded4446f
        fw.undi 1.3534.0
        fw.psid.api 4.40
        fw.bundle_id 0x8001c7d4
        fw.app.name ICE OS Default Package
        fw.app 1.3.36.0
        fw.app.bundle_id 0xc0000001
        fw.netlist 4.4.5000-2.15.0
        fw.netlist.build 0x0ba411b9
      stored:
        fw.undi 1.3534.0
        fw.psid.api 4.40
        fw.bundle_id 0x8001c7d4
        fw.netlist 4.4.5000-2.15.0
        fw.netlist.build 0x0ba411b9

...

RoCEv2 kasutamine

Oluline on, et see ip konf millega opereeritakse järgnevalt oleks kinnitatud füüsilise seadme külge (st mitte nii, et arvutis on kasutusel nt OVS virtual switch ning ip on vlan47 küljes, sama ovs bridge küljes on füüsiline enp129s0f1np1 jne, st siis ei toimi)

root@pve-02:~# ip addr show dev enp129s0f1np1:
3: enp129s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 3c:ec:ef:e6:69:b9 brd ff:ff:ff:ff:ff:ff
    inet 10.47.218.226/24 scope global enp129s0f1np1
       valid_lft forever preferred_lft forever
    inet6 fe80::3eec:efff:fee6:69b9/64 scope link 
       valid_lft forever preferred_lft forever


Muu hulgas toetab see seade iwarp ja rocev2 protokolle, devlink vaatest paistab see nii

# devlink dev param 
pci/0000:81:00.0:
  name enable_roce type generic
    values:
      cmode runtime value true
  name enable_iwarp type generic
    values:
      cmode runtime value false
pci/0000:81:00.1:
  name enable_roce type generic
    values:
      cmode runtime value false
  name enable_iwarp type generic
    values:
      cmode runtime value true

Korraga saab olla aktiivne üks või teine, nende vahel valmine toimub nt selliselt devlink abil

# devlink dev param set pci/0000:81:00.1 name enable_iwarp value false cmode runtime

rocev2 kasutamiseks on asjakohen süsteemidesse paigaldada omajagu rdma ja infiniband traditsiooniga tarkvara, nt

# apt-get install rdma-core
# apt-get install perftest
# apt-get install ibverbs-utils
# apt-get install infiniband-diags
# apt-get install ibverbs-providers

Asjasse puutuvad sellised driverid

  • ice - põhi draiver
  • irdma - intel võrgukaartide (2024 aastal e810 ja midagi vanemat ka) - 'intel rdma'

Seadmed kui rdma seadmed paistavad

root@pve-01:~# rdma link
link rocep129s0f0/1 state ACTIVE physical_state LINK_UP netdev enp129s0f0np0 
link rocep129s0f1/1 state ACTIVE physical_state LINK_UP netdev enp129s0f1np1 

Kahe otse kaabliga süsteemi uurimiseks sobib öelda ühes ja teises arvutis nt nii, pve-02 on nö rping server ja pve-01 on rping klient; seejuures on iseloomulik, et tavalise võrguliidese peal tavalisel viisil võrku pealt kuulates ei ole midagi näha

root@pve-02:~# rping -s -a 10.47.218.226 -v
server ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
server ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
server ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
server ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
server ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
server DISCONNECT EVENT...
wait for RDMA_READ_ADV state 10

root@pve-01:~# rping -c -a 10.47.218.226 -v -C 5
ping data: rdma-ping-0: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqr
ping data: rdma-ping-1: BCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrs
ping data: rdma-ping-2: CDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst
ping data: rdma-ping-3: DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu
ping data: rdma-ping-4: EFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuv
client DISCONNECT EVENT...

Veel huvitavaid utiliite, seejuures on esimene seade 'devlink dev param ...' abil seadistatud rocev2 režiimi ja teine iwarp režiimi

# ibv_devices 
    device                 node GUID
    ------              ----------------
    rocep129s0f0        3eeceffffee667b2
    iwp129s0f1          3eeceffffee667b3


# ibv_devinfo 
hca_id: rocep129s0f0
        transport:                      InfiniBand (0)
        fw_ver:                         1.71
        node_guid:                      3eec:efff:fee6:67b2
        sys_image_guid:                 3eec:efff:fee6:67b2
        vendor_id:                      0x8086
        vendor_part_id:                 5531
        hw_ver:                         0x2
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             Ethernet

hca_id: iwp129s0f1
        transport:                      iWARP (1)
        fw_ver:                         1.71
        node_guid:                      3eec:efff:fee6:67b3
        sys_image_guid:                 3eec:efff:fee6:67b3
        vendor_id:                      0x8086
        vendor_part_id:                 5531
        hw_ver:                         0x2
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             Ethernet

Töötaval juhul roce seadmega

root@pve-02:~# ib_send_bw -i 1 -d rocep129s0f1

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : rocep129s0f1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 RX depth        : 512
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x01 QPN 0x0004 PSN 0x807cd5
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:47:218:226
 remote address: LID 0x01 QPN 0x0004 PSN 0xba4fee
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:47:218:225
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3696.643000 != 3000.000000. CPU Frequency is not max.
 65536      1000             0.00               2762.05            0.044193
---------------------------------------------------------------------------------------

ja esimene arvuti kliendina

root@pve-01:~# ib_send_bw -i 1 10.47.218.226 -d rocep129s0f1
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : rocep129s0f1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x01 QPN 0x0004 PSN 0xba4fee
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:47:218:225
 remote address: LID 0x01 QPN 0x0004 PSN 0x807cd5
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:47:218:226
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3698.021000 != 3000.000000. CPU Frequency is not max.
 65536      1000             2757.88            2757.84            0.044126
---------------------------------------------------------------------------------------

kus

  • tundub, et ib_send_bw on infiniband põlvnevusega utiliit
  • Transport type on IB ehk infiniband, st nii nagu roce protokolli tööpõhimõte on
  • Link type on ethernet

Mittetöötav juhum - kuna kasutatakse iwarp seadmel

root@pve-02:~# ib_send_bw -i 1 -d iwp129s0f1

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : iwp129s0f1
 Number of qps   : 1            Transport type : IW
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 RX depth        : 512
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 0
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
ethernet_read_keys: Couldn't read remote address
 Unable to read to socket/rdma_cm
Failed to exchange data between server and clients
Failed to deallocate PD - Device or resource busy
Failed to destroy resources

Mittetöötav juhtum - kuna seade on küll arvutis olemas, aga seal ei ole kinnitatud kõnealust ip aadressi

root@pve-02:~# ib_send_bw -i 1 -d rocep129s0f0 

************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : rocep129s0f0
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 RX depth        : 512
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 0
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
Failed to modify QP 5 to RTR
 Unable to Connect the HCA's through the link

Kõige praktilisem kasutusjuht on iscsi kasutamine iser + rocev2 viisil. Tulemusena on andmevahetuse kiirus 2x suurem võrreldes nö tavalisega. Target näeb välja selline

/> ls /
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 1]
  | | o- iscsi_block_md127 ............................................................. [/dev/md127 (27.9TiB) write-thru activated]
  | |   o- alua ................................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 1]
  | o- iqn.2003-01.org.setup.lun.test .................................................................................... [TPGs: 1]
  |   o- tpg1 .......................................................................................... [no-gen-acls, auth per-acl]
  |     o- acls .......................................................................................................... [ACLs: 1]
  |     | o- iqn.1993-08.org.debian:01:b65e1ba35869 ................................................... [1-way auth, Mapped LUNs: 1]
  |     |   o- mapped_lun0 ..................................................................... [lun0 block/iscsi_block_md127 (rw)]
  |     o- luns .......................................................................................................... [LUNs: 1]
  |     | o- lun0 ........................................................ [block/iscsi_block_md127 (/dev/md127) (default_tg_pt_gp)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 10.47.218.226:3261 ............................................................................................. [iser]
  o- loopback ......................................................................................................... [Targets: 0]
  o- srpt ............................................................................................................. [Targets: 0]
  o- vhost ............................................................................................................ [Targets: 0]
  o- xen-pvscsi ....................................................................................................... [Targets: 0]

kus iser lülitatakse sisse käsuga

/> iscsi/iqn.2003-01.org.setup.lun.test/tpg1/portals/10.47.218.226:3261 enable_iser boolean=true
iSER enable now: True

iscsi kliendi poolega ehk initiator poolega tegelemine

root@pve-01:~# iscsiadm -m discovery -t st -p 10.47.218.226:3261
root@pve-01:~# iscsiadm -m node -T iqn.2003-01.org.setup.lun.test -p 10.47.218.226:3261 -o update -n iface.transport_name -v iser
root@pve-01:~# iscsiadm -m node -T iqn.2003-01.org.setup.lun.test -p 10.47.218.226:3261 -l
root@pve-01:~# lsscsi -s
...

root@pve-01:~# iscsiadm -m node -T iqn.2003-01.org.setup.lun.test -p 10.47.218.226:3261 -u
root@pve-01:~# iscsiadm -m discoverydb -t sendtargets -p 10.47.218.226:3261 -o delete
root@pve-01:~# iscsiadm -m discoverydb

Kasulikud lisamaterjalid

Scalable Functions

Kasulikud lisamaterjalid

Kasulikud lisamaterjalid