Mellanox ConnectX-6 Lx EN: erinevus redaktsioonide vahel
119. rida: | 119. rida: | ||
Märkused |
Märkused |
||
− | * lahendusele on iseloomulik, et üks CPU on 100% koormatud |
+ | * lahendusele on iseloomulik, et üks CPU on 100% koormatud ('us' user load top väljundis) |
Jõudluse hindamine |
Jõudluse hindamine |
Redaktsioon: 1. juuli 2024, kell 23:49
Sissejuhatus
Mellanox riistvara
Väited
- Mellanox nö kangemad võrguseadmed jaotatakse kaheks suuremaks osakonnaks: 1. SmartNIC, 2. SuperNIC
- SmartNIC - nvidia connectx seadmed
- SuperNIC - nvidia bluefield seadmed
ConnectX seadmed
- 'connectx-6 lx' ja 'connectx-6 dx' seadmed on kõik ethernet seadmed (st mitte infiniband)
- 'connectx-6' seade on füüsiliselt universaalne ethernet/infiniband seade, st võimalik on tarkvaraliselt kaardi poole pöördudes lülitada ta käima ethernet või infiniband režiimis
Mellanox integratsioonid
dpdk
Ubuntu v. 24.04
Tulemuseks ja eesmärgiks on selline asjakorraldus, ovs switch paistab selline
root@dpdp-u2404:~# ovs-vsctl show 09d915bd-744b-4ff1-a223-983c02f05f3b Bridge br0 datapath_type: netdev Port dpdk-p0 Interface dpdk-p0 type: dpdk options: {dpdk-devargs="0000:0f:00.0"} Port br0 Interface br0 type: internal Port vlan11 tag: 11 Interface vlan11 type: internal ovs_version: "3.3.0"
Ning võrk toimib selliselt
root@dpdp-u2404:~# ping -c 4 192.168.1.254 PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data. 64 bytes from 192.168.1.254: icmp_seq=1 ttl=255 time=0.765 ms 64 bytes from 192.168.1.254: icmp_seq=2 ttl=255 time=0.344 ms 64 bytes from 192.168.1.254: icmp_seq=3 ttl=255 time=0.285 ms 64 bytes from 192.168.1.254: icmp_seq=4 ttl=255 time=0.312 ms --- 192.168.1.254 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3100ms rtt min/avg/max/mdev = 0.285/0.426/0.765/0.196 ms
Samal ajal ei ole nö tavalisel võrguliidesel midagi kuulda, põhjusel, et kernel ei tegele nende pakettidega tavalises mõttes
root@dpdp-u2404:~# tcpdump -ni enp15s0f0np0 libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'. tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enp15s0f0np0, link-type EN10MB (Ethernet), snapshot length 262144 bytes ^C 0 packets captured 0 packets received by filter 0 packets dropped by kernel
Sellise olukorra saavutamiseks Ubuntu v. 24.04 operatsioonisüsteemil paigaldatakse tarkvara OVS-DPDK, sobib taustaks vaadata juhendeid
- 'How to use DPDK with Open vSwitch' - https://ubuntu.com/server/docs/how-to-use-dpdk-with-open-vswitch
- 'OVS Offload Using ASAP² Direct' - https://docs.nvidia.com/networking/display/mlnxofedv590590/ovs+offload+using+asap%C2%B2+direct#src-2408744435_safe-id-T1ZTT2ZmbG9hZFVzaW5nQVNBUMKyRGlyZWN0LU9WUy1LZXJuZWxIYXJkd2FyZU9mZmxvYWRz
- 'Using Open vSwitch with DPDK' - https://docs.openvswitch.org/en/latest/howto/dpdk/
root@dpdp-u2404:~# apt-get install openvswitch-switch-dpdk
Muu hulgas paigaldatakse sõltuvustena paketid
- dpdk
- openvswitch-switch
Ja kävitatakse ovs protsessid
root@dpdp-u2404:~# systemctl | grep ovs | grep runni ovs-vswitchd.service loaded active running Open vSwitch Forwarding Unit ovsdb-server.service loaded active running Open vSwitch Database Unit
kusjuures ovs tööd juhib fail 'root@dpdp-u2404:~# less /var/lib/openvswitch/conf.db' st kui midagi läheb ovs osakonna seadistamisel valesti sobib uuesti algamiseks lõpetada protsessid, kustutada failid ja käivitada protsessid
root@dpdp-u2404:~# systemctl stop ovs-vswitchd root@dpdp-u2404:~# systemctl stop ovsdb-server root@dpdp-u2404:~# rm /var/lib/openvswitch/.conf.db.~lock~ root@dpdp-u2404:~# rm /var/lib/openvswitch/conf.db root@dpdp-u2404:~# systemctl stop ovs-vswitchd
Peale tarkvara paigaldamist on ovs käivitatud olekus ning sobib seda edasi seadistada
root@dpdp-u2404:~# echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages root@dpdp-u2404:~# update-alternatives --get-selections root@dpdp-u2404:~# update-alternatives --set ovs-vswitchd /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk root@dpdp-u2404:~# update-alternatives --get-selections root@dpdp-u2404:~# ovs-vsctl set Open_vSwitch . "other_config:dpdk-init=true" root@dpdp-u2404:~# ovs-vsctl set Open_vSwitch . "other_config:dpdk-lcore-mask=0x1" root@dpdp-u2404:~# ovs-vsctl set Open_vSwitch . "other_config:dpdk-alloc-mem=2048" root@dpdp-u2404:~# ovs-vsctl set Open_vSwitch . "other_config:dpdk-extra=--allow=0000:0f:00.0" root@dpdp-u2404:~# systemctl restart openvswitch-switch
OVS rakenduses sisemiste sadistuste tegemiseks
root@dpdp-u2404:~# ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev root@dpdp-u2404:~# ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:0f:00.0 root@dpdp-u2404:~# ovs-vsctl add-port br0 vlan11 tag=11 -- set interface vlan11 type=internal root@dpdp-u2404:~# ifconfig vlan11 192.168.1.57/24
Märkused
- lahendusele on iseloomulik, et üks CPU on 100% koormatud ('us' user load top väljundis)
Jõudluse hindamine
TODO
Kasulikud lisamaterjalid
- https://enterprise-support.nvidia.com/s/article/mellanox-dpdk
- https://doc.dpdk.org/guides/nics/mlx5.html
kernel tls
TODO
inbox draiver ja utiliidid
# apt-get install mstflint
Kaardi SRIOV muutmiseks sobib öelda
root@pve-moraal-x570:~# mstconfig -d 0000:0f:00.0 set SRIOV_EN=False Device #1: ---------- Device type: ConnectX6LX Name: MCX631102AN-ADA_Ax Description: ConnectX-6 Lx EN adapter card; 25GbE ; Dual-port SFP28; PCIe 4.0 x8; No Crypto Device: 0000:0f:00.0 Configurations: Next Boot New SRIOV_EN True(1) False(0) Apply new Configuration? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations.
Tulemusena ei ole enam VF võimekust
root@pve-moraal-x570:~# ls -ld /sys/class/net/enp15s0f0np0/device/s* lrwxrwxrwx 1 root root 0 Jun 30 03:32 /sys/class/net/enp15s0f0np0/device/subsystem -> ../../../../bus/pci -r--r--r-- 1 root root 4096 Jun 30 03:35 /sys/class/net/enp15s0f0np0/device/subsystem_device -r--r--r-- 1 root root 4096 Jun 30 03:35 /sys/class/net/enp15s0f0np0/device/subsystem_vendor
Peale tagasi sisse lülitamist on VF võimekus tagasi
root@pve-moraal-x570:~# mstconfig -d 0000:0f:00.0 set SRIOV_EN=True root@pve-moraal-x570:~# ls -ld /sys/class/net/enp15s0f0np0/device/s* -rw-r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_drivers_autoprobe -rw-r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_numvfs -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_offset -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_stride -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_totalvfs -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_vf_device -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/sriov_vf_total_msix lrwxrwxrwx 1 root root 0 Jun 30 03:39 /sys/class/net/enp15s0f0np0/device/subsystem -> ../../../../bus/pci -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/subsystem_device -r--r--r-- 1 root root 4096 Jun 30 03:43 /sys/class/net/enp15s0f0np0/device/subsystem_vendor root@pve-moraal-x570:~# cat /sys/class/net/enp15s0f0np0/device/sriov_totalvfs 8
Nende muudatuste tegemiseks peab olema secure boot välja lülitatud.
mstlink utiliit
root@pve-moraal-x570:~# mstlink -d 0000:0f:00.0 --show_device Operational Info ---------------- State : Polling Physical state : ETH_AN_FSM_ENABLE Speed : N/A Width : N/A FEC : N/A Loopback Mode : No Loopback Auto Negotiation : FORCE - 25G,10G,1G Supported Info -------------- Enabled Link Speed (Ext.) : 0x00000052 (25G,10G,1G) Supported Cable Speed (Ext.) : 0x00000003 (1G,100M) Troubleshooting Info -------------------- Status Opcode : 36 Group Opcode : PHY FW Recommendation : Force Mode no partner detected. Tool Information ---------------- Firmware Version : 26.41.1000 amBER Version : 2.05 MSTFLINT Version : mstflint 4.21.0 Device Info ----------- Part Number : N/A Part Name : N/A Serial Number : N/A Revision : N/A FW Version : 26.41.1000 Note: P/N, Product Name, S/N and Revision are supported only in switches
Virtual Functions kasutamiseks, lähtepunt on nö tavaolek
root@pve-moraal-x570:~# lspci | grep Mellanox 0f:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] 0f:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
4 funktsiooni kasutamiseks sobib öelda
root@pve-moraal-x570:~# lspci | grep Mellanox 0f:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] 0f:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] 0f:00.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0f:00.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0f:00.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0f:00.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
Kusjuures moodustatakse sellised seadmed
root@pve-moraal-x570:~# ip link show dev enp15s0f0np0 50: enp15s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether e8:eb:d3:0b:78:74 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off root@pve-moraal-x570:~# ip link show dev enp15s0f0v0 56: enp15s0f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 3a:48:d1:d6:57:dd brd ff:ff:ff:ff:ff:ff
VF seadet saab host peal kasutama asuda praktiliselt nagu tavalist võrguseadet. Teine variant on ta saata edasi pcie passthru abil PVE virtuaalsele arvutile. Virtuaalses arvutis paistab kaart selline
root@pve-sdn-01:~# lspci | grep Mell 01:00.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function root@pve-sdn-01:~# ethtool -i enp1s0 driver: mlx5_core version: 6.8.8-1-pve firmware-version: 26.41.1000 (MT_0000000531) expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
kus
- virtuaalses arvutis sobib kasutada sama mlx5 draiverit mida kasutatakse pve host peal
dpdk kasutamine
Paigaldatakse dpdk ja dpdk-dev paketid
# apt-get install dpdk dpdk-dev
Tulemusena on süsteemis muu hulgas utiliidid
- dpdk-devbind.py
- dpdk-hugepages.py
- dpdk-testpmd
Olukorra hindamine, vt https://doc.dpdk.org/guides/nics/mlx5.html -> 'Usage example'
root@pve-moraal-x570:~# ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5 enp15s0f0np0 enp15s0f1np1
testpmd käivitamine, täpslt ei ole saada aru, kas see on edukas käivitamine
root@pve-moraal-x570:~# dpdk-testpmd -l 8-15 -n 4 -a 0f:00.0 -a 0f:00.1 -- --rxq=2 --txq=2 -i EAL: Detected CPU lcores: 24 EAL: Detected NUMA nodes: 1 EAL: Detected shared linkage of DPDK EAL: Multi-process socket /var/run/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: Probe PCI driver: mlx5_pci (15b3:101f) device: 0000:0f:00.0 (socket -1) EAL: Probe PCI driver: mlx5_pci (15b3:101f) device: 0000:0f:00.1 (socket -1) Interactive-mode selected Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa. testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc Configuring Port 0 (socket 0) Port 0: E8:EB:D3:0B:78:74 Configuring Port 1 (socket 0) Port 1: E8:EB:D3:0B:78:75 Checking link statuses... Done testpmd>
Näiteks
root@pve-moraal-x570:~# dpdk-testpmd -l 6-9 -n 4 -a 0f:00.0 -- --rxq=4 --txq=4 -i testpmd> set fwd txonly testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 0 RX-missed: 0 RX-bytes: 0 RX-errors: 0 RX-nombuf: 0 TX-packets: 5153664 TX-errors: 0 TX-bytes: 329834496 Throughput (since last show) Rx-pps: 0 Rx-bps: 0 Tx-pps: 1420582 Tx-bps: 727338208 ############################################################################
ja teisel pordil võrku pealt kuulates paistab
root@pve-moraal-x570:~# tcpdump -c 4 -nei enp15s0f1np1 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enp15s0f1np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes 09:07:53.700764 e8:eb:d3:0b:78:74 > 02:00:00:00:00:00, ethertype IPv4 (0x0800), length 64: 198.18.0.1.9 > 198.18.0.2.9: UDP, length 22 09:07:53.700764 e8:eb:d3:0b:78:74 > 02:00:00:00:00:00, ethertype IPv4 (0x0800), length 64: 198.18.0.1.9 > 198.18.0.2.9: UDP, length 22 09:07:53.700764 e8:eb:d3:0b:78:74 > 02:00:00:00:00:00, ethertype IPv4 (0x0800), length 64: 198.18.0.1.9 > 198.18.0.2.9: UDP, length 22 09:07:53.700765 e8:eb:d3:0b:78:74 > 02:00:00:00:00:00, ethertype IPv4 (0x0800), length 64: 198.18.0.1.9 > 198.18.0.2.9: UDP, length 22
Kasulikud lisamaterjalid
- https://www.youtube.com/watch?v=KX1QOqMtchg
- https://www.youtube.com/watch?v=0yDdMWQPCOI
- https://www.youtube.com/watch?v=Un5-AN4nb9s
Misc
# lspci | grep 3d:00 3d:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx] 3d:00.1 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
devlink show andmed
# devlink dev show pci/0000:3d:00.0 pci/0000:3d:00.1
ja devlink info
# devlink dev info pci/0000:3d:00.0: driver mlx5_core versions: fixed: fw.psid SM_1281000001000 running: fw.version 26.35.2000 fw 26.35.2000 stored: fw.version 26.35.2000 fw 26.35.2000 pci/0000:3d:00.1: driver mlx5_core versions: fixed: fw.psid SM_1281000001000 running: fw.version 26.35.2000 fw 26.35.2000 stored: fw.version 26.35.2000 fw 26.35.2000
ethtool andmed
# ethtool -i ens7f0np0 driver: mlx5_core version: 5.15.0-92-generic firmware-version: 26.35.2000 (SM_1281000001000) expansion-rom-version: bus-info: 0000:3d:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes
kus
- ethtool ja devlink-dev-info väljundis klapib kaardil oleva firmware versioon - 26.35.2000
Tootja MLNX EN tarkvara kasutamine
Mõisted
- MFT - NVIDIA Firmware Tools, tõenäoliselt algupäraselt Mellanox Firmware Tools
Kasutamine füüsilise seadme tervikuna passthru režiimis
Väited
- üldiselt proxmox v. 8 keskkonnas saab mellanox seadme anda üle virtuaalsele arvuti tavalisel viisil (valides pve webgui liidses 'Add -> PCI device' ja näidates esimese MLNX seadme; teine lisatakse automaatselt
- tundub, et füüsilist mellanox seadet ei saa tervikuna nö täiuslikult virtuaalsele arvutile edasi anda, ühe asjana puudub jääb sr-iov võimekus
- virtuaalses arvutis saab võrguseadet kasutada tema PF osas, VF ei ole ligipääsetav
- virtuaalsele arvutile saab lisada tavalise pve webgui peal vIOMMU ning siis paigutatakse virtuaalse arvuti seadmed sh erinevad MLNX adapteri füüsilised pordid erinevatesse IOMMU gruppidesse
Sellisele asjakorraldusele on üldiselt iseloomulik, et host arvutis tegeleb edasi vfio draiver seadmega
root@pve-moraal-x570:~# lspci -vvv | grep vfio Kernel driver in use: vfio-pci Kernel driver in use: vfio-pci
Ilma rebootida host peale seadme koos mlx driveri kasutamisega tagasi saamiseks sobib
- lõpetada virtuaalse arvuti töötamine
- öelda host peal
echo 1 > /sys/bus/pci/devices/0000\:0f\:00.0/remove echo 1 > /sys/bus/pci/devices/0000\:0f\:00.1/remove echo 1 > /sys/bus/pci/rescan
Tarkvara paigaldamine
TODO
root@debian-mlnx-01:~# mount /root/mlnx-en-24.04-0.6.6.0-debian12.1-x86_64.iso /mnt/mlnx
root@debian-mlnx-01:~# find /lib/modules/6.1.0-22-amd64/ -type f -mmin -20 -ls | grep dkms 661960 4312 -rw-r--r-- 1 root root 4415205 Jun 29 22:07 /lib/modules/6.1.0-22-amd64/updates/dkms/mlx5_core.ko 661959 28 -rw-r--r-- 1 root root 25237 Jun 29 22:07 /lib/modules/6.1.0-22-amd64/updates/dkms/mlx_compat.ko 661961 8 -rw-r--r-- 1 root root 5565 Jun 29 22:07 /lib/modules/6.1.0-22-amd64/updates/dkms/mlx5_ib.ko 661962 52 -rw-r--r-- 1 root root 49445 Jun 29 22:07 /lib/modules/6.1.0-22-amd64/updates/dkms/mlxfw.ko 661963 208 -rw-r--r-- 1 root root 210797 Jun 29 22:07 /lib/modules/6.1.0-22-amd64/updates/dkms/mlxdevm.ko
Firmware uuendamine
root@debian-mlnx-01:~# apt-get install mlnx-fw-updater Reading package lists... Done Building dependency tree... Done Reading state information... Done The following package was automatically installed and is no longer required: linux-image-6.1.0-15-amd64 Use 'apt autoremove' to remove it. The following NEW packages will be installed: mlnx-fw-updater 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B/50.2 MB of archives. After this operation, 87.9 MB of additional disk space will be used. Get:1 file:/mnt/mlnx/DEBS_ETH ./ mlnx-fw-updater 24.04-0.6.6.0 [50.2 MB] Selecting previously unselected package mlnx-fw-updater. (Reading database ... 63847 files and directories currently installed.) Preparing to unpack .../mlnx-fw-updater_24.04-0.6.6.0_amd64.deb ... Unpacking mlnx-fw-updater (24.04-0.6.6.0) ... Setting up mlnx-fw-updater (24.04-0.6.6.0) ... Initializing... Attempting to perform Firmware update... Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX6LX Part Number: MCX631102AN-ADA_Ax Description: ConnectX-6 Lx EN adapter card; 25GbE ; Dual-port SFP28; PCIe 4.0 x8; No Crypto PSID: MT_0000000531 PCI Device Name: 01:00.0 Base GUID: e8ebd303000b7874 Base MAC: e8ebd30b7874 Versions: Current Available FW 26.32.2004 26.41.1000 PXE 3.6.0502 3.7.0400 UEFI 14.25.0018 14.34.0012 Status: Update required --------- Found 1 device(s) requiring firmware update... Device #1: Updating FW ... FSMST_INITIALIZE - OK Writing Boot image component - OK Done Restart needed for updates to take effect. Log File: /tmp/oaFVUkaJsl Real log file: /tmp/mlnx_fw_update.log root@debian-mlnx-01:~# less /tmp/mlnx_fw_update.log CMD: mlxup -u --log-on-update --ssl-certificate /tmp/OloIGrYWuz/mlxfwmanager_sriov_dis_x86_64_4127-dir/ca-bundle.crt --current-dir /opt/mellanox/mlnx-fw-updater/ -L /tmp/oaFVUkaJsl -y -d 01:00.0 Querying Mellanox devices firmware ... Device #1: ---------- ...
Paistab, et tulemusena on kaardil olemas kaks versiooni firmwarest
root@debian-mlnx-01:~# devlink dev info pci/0000:01:00.0: driver mlx5_core versions: fixed: fw.psid MT_0000000531 running: fw.version 26.32.2004 fw 26.32.2004 stored: fw.version 26.41.1000 fw 26.41.1000 pci/0000:01:00.1: driver mlx5_core versions: fixed: fw.psid MT_0000000531 running: fw.version 26.32.2004 fw 26.32.2004 stored: fw.version 26.41.1000 fw 26.41.1000
kus
- running version - 26.32.2004
- stored version - 26.41.1000
systemd unit mlnx-en.d
Tundub, et mlx driveritega tegeleb systemd unit
root@debian-mlnx-01:~# dpkg -S /etc/mlnx-en.conf mlnx-en-utils: /etc/mlnx-en.conf root@debian-mlnx-01:~# cat /etc/mlnx-en.conf # Allow calling the service script with the option 'stop' for unloading the driver stack. # This flag should be disabled when the OS root file system is on remote storage. ALLOW_STOP=yes # Run sysctl performance tuning script RUN_SYSCTL=no # Run /usr/sbin/mlnx_tune RUN_MLNX_TUNE=no # Load MLX4 modules MLX4_LOAD=no # Load MLX5 modules MLX5_LOAD=yes root@debian-mlnx-01:~# systemctl start mlnx-en.d root@debian-mlnx-01:~# systemctl status mlnx-en.d ● mlnx-en.d.service - mlnx-en.d - configure Mellanox devices Loaded: loaded (/lib/systemd/system/mlnx-en.d.service; enabled; preset: enabled) Active: active (exited) since Sun 2024-06-30 00:49:29 EEST; 6s ago Docs: file:/etc/mlnx-en.conf Process: 1505 ExecStart=/etc/init.d/mlnx-en.d start (code=exited, status=0/SUCCESS) Main PID: 1505 (code=exited, status=0/SUCCESS) CPU: 385ms Jun 30 00:49:26 debian-mlnx-01 systemd[1]: Starting mlnx-en.d.service - mlnx-en.d - configure Mellanox devices... Jun 30 00:49:28 debian-mlnx-01 mlnx-en.d[1505]: [32B blob data] Jun 30 00:49:29 debian-mlnx-01 systemd[1]: Finished mlnx-en.d.service - mlnx-en.d - configure Mellanox devices.
samal ajal dmesg väljundis
# dmesg -T -w [Sun Jun 30 00:49:26 2024] Compat-mlnx-ofed backport release: 7037b8d [Sun Jun 30 00:49:26 2024] Backport based on https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git 7037b8d [Sun Jun 30 00:49:26 2024] compat.git: https://:@git-nbu.nvidia.com/r/a/mlnx_ofed/mlnx-ofa_kernel-4.0.git [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: firmware version: 26.32.2004 [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: 126.024 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x8 link) [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048) [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: Port module event: module 0, Cable plugged [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: mlx5_pcie_event:304:(pid 1398): PCIe slot advertised sufficient power (75W). [Sun Jun 30 00:49:26 2024] mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic) [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.0 enp1s0f0np0: renamed from eth0 [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: firmware version: 26.32.2004 [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: 126.024 Gb/s available PCIe bandwidth (16.0 GT/s PCIe x8 link) [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 24414Mbps [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: E-Switch: Total vports 2, per vport: max uc(128) max mc(2048) [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: Port module event: module 1, Cable plugged [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: mlx5_pcie_event:304:(pid 1391): PCIe slot advertised sufficient power (75W). [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic) [Sun Jun 30 00:49:27 2024] mlx5_core 0000:01:00.1 enp1s0f1np1: renamed from eth0
Kusjuures 'systemctl stop mlnx-en.d' eemaldab mlx moodulid mälust
root@debian-mlnx-01:~# lsmod | grep mlx mlx5_core 2269184 0 mlxfw 36864 1 mlx5_core mlxdevm 180224 1 mlx5_core mlx_compat 20480 2 mlxdevm,mlx5_core psample 20480 1 mlx5_core tls 135168 1 mlx5_core pci_hyperv_intf 16384 1 mlx5_core root@debian-mlnx-01:~# systemctl stop mlnx-en.d root@debian-mlnx-01:~# lsmod | grep mlx root@debian-mlnx-01:~#
devlink kasutamine
root@debian-mlnx-01:~# devlink dev param show pci/0000:01:00.0 name enable_roce pci/0000:01:00.0: name enable_roce type generic values: cmode driverinit value true
Monitoring
Tundub, et inimese jaoks on see kaarti küljes olev füüsiline radiaator päris kuum (sõrme küljes ei jaoks hoida), ja tundub, et sisuliselt on see ok, 'The adapter card incorporates the ConnectX IC, which operates in the range of temperatures between 0°C and 105°C.', https://docs.nvidia.com/networking/display/connectx6lxen/monitoring
root@debian-mlnx-01:~# devlink dev pci/0000:01:00.0 pci/0000:01:00.1 root@debian-mlnx-01:~# mget_temp -d 0000:01:00.0 82 root@debian-mlnx-01:~# mget_temp -d 0000:01:00.1 83
Kasulikud lisamaterjalid
Misc
- https://www.youtube.com/watch?v=XLPgDEbUMgk - 'How to set Mellanox ConnectX VPI to Ethernet or Infiniband in Linux'