Konstruktori arvuti - ASUS WRX90E + AMD Threadripper PRO 7965WX

Allikas: Imre kasutab arvutit
Mine navigeerimisribaleMine otsikasti

Sissejuhatus

Kõnealuse arvuti tekitamise motivatsioonist moodustab tugeva osa 'et oleks huvitav' komponent

  • millised on kaasaegsele riistvara iseloomulikud võimalused
  • kuidas saab 2025 aastal Linux operatsioonisüsteemi kasutusjuhtumil riistvara kasutada
  • kuidas saab 2025 aastal Proxmox virtualiseerimise plotvormi kasutusjuhtumil riistvara kasutada

Riistvara

Arvuti komponentide koosseis

  • emaplaat - ASUS WRX90E SAGE
  • protsessor - AMD Threadripper PRO 7965WX
  • mälu - 2 x 96 GB
  • toiteplokk - seasonic 850 W
  • videokaart -
  • nvme seade -

Võrreldes tavalise PC platvormiga

  • mitte 2, aga 8 memory channel ressurssi
  • mitte 24, aga 128 pcie lane ressurssi
  • mitte 2-3 m.2 nvme seadet, aga 4
  • mitte 3-4 pcie pesa, aga 7
  • mitte 1 cpu toide, aga 2 (võimalus kasutada kahte füüsilist toiteplokki)
  • korralik iommu eraldatus (nt pcie passthru jaoks)
  • korralik pcie bifurcation (nt x16 -> 4-korda-x4 nvme salvestusseadmete kasutamiseks adapterkaardiga)
  • kaughalduse liides
  • füüsiline com ehk serial port

Võrreldus tavalise server platvormiga

  • mitte 2, aga 1 füüsiline cpu soket
  • põhimõtteliselt ei pea ühe soket protsessoriga avutite kasutajatele väga kaasa tundma, sest ühe protsessori kujul sama arvutuskorpus (tuumade arv, kiirus, mälu, cache jms) vs kahe protsessori kujul puhul on ühe protsessori eeliseks väiksem nö multi-processor overhead (tegevusi soketite vahel kooskõlastada jms)

Üldine nö PCI topoloogia, 'lstopo --of txt'

20250215-wrx90e-lstopo-01.jpeg

kus

  • midagi äärmiselt huvitavat ei ole näha kuna tegu on ühe socketiga arvutiga, st ei ole olukorda, et osa pci seadmeid on ühe ja osa teise cpu külge kinnitatud

Protsessor

root@pve-wrx90e:~# lscpu 
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          52 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   48
  On-line CPU(s) list:    0-47
Vendor ID:                AuthenticAMD
  BIOS Vendor ID:         Advanced Micro Devices, Inc.
  Model name:             AMD Ryzen Threadripper PRO 7965WX 24-Cores
    BIOS Model name:      AMD Ryzen Threadripper PRO 7965WX 24-Cores      Unknown CPU @ 4.2GHz
    BIOS CPU family:      107
    CPU family:           25
    Model:                24
    Thread(s) per core:   2
    Core(s) per socket:   24
    Socket(s):            1
    Stepping:             1
    CPU(s) scaling MHz:   18%
    CPU max MHz:          5362.0000
    CPU min MHz:          545.0000
    BogoMIPS:             8387.22
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid ex
                          td_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetc
                          h osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms i
                          nvpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk a
                          vx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi a
                          vx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d debug_swap
Virtualization features:  
  Virtualization:         AMD-V
Caches (sum of all):      
  L1d:                    768 KiB (24 instances)
  L1i:                    768 KiB (24 instances)
  L2:                     24 MiB (24 instances)
  L3:                     128 MiB (4 instances)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-47
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Mitigation; Safe RET
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

kus

  • mälu adresseerimiseks on kasutada 'Address sizes: 52 bits physical, 57 bits virtual'

Võrgukaart Intel x710

root@pve-wrx90e:~# lspci | grep -i ether
01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02)
01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02)

root@pve-wrx90e:~# driverctl list-devices network
0000:01:00.0 i40e
0000:01:00.1 i40e

root@pve-wrx90e:~# devlink dev info
pci/0000:01:00.0:
  driver i40e
  serial_number 35-56-82-ff-ff-84-cf-60
  versions:
      fixed:
        board.id 000000-000
      running:
        fw.mgmt 9.140
        fw.mgmt.build 76856
        fw.mgmt.api 1.15
        fw.psid.api 9.40
        fw.bundle_id 0x8000efef
        fw.undi 1.3534.0
pci/0000:01:00.1:
  driver i40e
  serial_number 35-56-82-ff-ff-84-cf-60
  versions:
      fixed:
        board.id 000000-000
      running:
        fw.mgmt 9.140
        fw.mgmt.build 76856
        fw.mgmt.api 1.15
        fw.psid.api 9.40
        fw.bundle_id 0x8000efef
        fw.undi 1.3534.0

ning üksikasjalikum vaade lspci abil

root@pve-wrx90e:~# lspci -vvv | less -N
..
    319 01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02)
    320         DeviceName: X710 DUAL 10G LAN1
    321         Subsystem: Intel Corporation Ethernet Network Adapter X710-TL
    322         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    323         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    324         Latency: 0, Cache Line Size: 64 bytes
    325         Interrupt: pin A routed to IRQ 65
    326         IOMMU group: 28
    327         Region 0: Memory at 100a1000000 (64-bit, prefetchable) [size=16M]
    328         Region 3: Memory at 100a3800000 (64-bit, prefetchable) [size=32K]
    329         Expansion ROM at f5780000 [disabled] [size=512K]
    330         Capabilities: [40] Power Management version 3
    331                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
    332                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
    333         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    334                 Address: 0000000000000000  Data: 0000
    335                 Masking: 00000000  Pending: 00000000
    336         Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
    337                 Vector table: BAR=3 offset=00000000
    338                 PBA: BAR=3 offset=00001000
    339         Capabilities: [a0] Express (v2) Endpoint, MSI 00
    340                 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
    341                         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
    342                 DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
    343                         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
    344                         MaxPayload 512 bytes, MaxReadReq 512 bytes
    345                 DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
    346                 LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <16us
    347                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
    348                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
    349                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    350                 LnkSta: Speed 8GT/s, Width x4
    351                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    352                 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
    353                          10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
    354                          EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
    355                          FRS- TPHComp- ExtTPHComp-
    356                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
    357                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
    358                          AtomicOpsCtl: ReqEn-
    359                 LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
    360                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
    361                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    362                          Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
    363                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
    364                          EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
    365                          Retimer- 2Retimers- CrosslinkRes: unsupported
    366         Capabilities: [e0] Vital Product Data
    367                 Product Name: Example VPD
    368                 Read-only fields:
    369                         [V0] Vendor specific: 
    370                         [RV] Reserved: checksum good, 0 byte(s) reserved
    371                 End
    372         Capabilities: [100 v2] Advanced Error Reporting
    373                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    374                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
    375                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
    376                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
    377                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
    378                 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
    379                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
    380                 HeaderLog: 00000000 00000000 00000000 00000000
    381         Capabilities: [140 v1] Device Serial Number 35-56-82-ff-ff-84-cf-60
    382         Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
    383                 ARICap: MFVC- ACS-, Next Function: 1
    384                 ARICtl: MFVC- ACS-, Function Group: 0
    385         Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
    386                 IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
    387                 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq-
    388                 IOVSta: Migration-
    389                 Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
    390                 VF offset: 16, stride: 1, Device ID: 154c
    391                 Supported Page Size: 00000553, System Page Size: 00000001
    392                 Region 0: Memory at 00000100a3000000 (64-bit, prefetchable)
    393                 Region 3: Memory at 00000100a3810000 (64-bit, prefetchable)
    394                 VF Migration: offset: 00000000, BIR: 0
    395         Capabilities: [1a0 v1] Transaction Processing Hints
    396                 Device specific mode supported
    397                 No steering table available
    398         Capabilities: [1b0 v1] Access Control Services
    399                 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
    400                 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
    401         Capabilities: [1d0 v1] Secondary PCI Express
    402                 LnkCtl3: LnkEquIntrruptEn- PerformEqu-
    403                 LaneErrStat: 0
    404         Kernel driver in use: i40e
    405         Kernel modules: i40e

Kaughaldus

Kaughalduse webgui paistab selline

20250705-asus-wrx90e-kvm-01.png

kus

  • avatud on inventory osakond

IMPI ehk kaughalduse komponendi firmware uuendamine paistab nii

20250705-asus-wrx90e-kvm-02.png

kus

  • TODO

Mingis mõttes saab jõuda arvuti BIOS/UEFI setup juurde (töötava avuti tingimustes)

20250705-asus-wrx90e-kvm-03.png

kus

  • TODO

PCIe kasutamine

On kasutada muu hulgas sellised pcie seadmed

root@pve-wrx90e:~# lspci | egrep "e1:00|e0:03"
e0:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo Dummy Host Bridge (rev 01)
e0:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo GPP Bridge (rev 01)

e1:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
e1:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)

ja logisse tekib

root@pve-wrx90e:~# dmesg -T

[Sun Jul  6 15:58:11 2025] vfio-pci 0000:e1:00.0: AER:   Error of this Agent is reported first
[Sun Jul  6 15:58:11 2025] vfio-pci 0000:e1:00.1: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
[Sun Jul  6 15:58:11 2025] vfio-pci 0000:e1:00.1:   device [10de:10f1] error status/mask=00001000/00000000
[Sun Jul  6 15:58:11 2025] vfio-pci 0000:e1:00.1:    [12] Timeout               

[Sun Jul  6 15:58:12 2025] pcieport 0000:e0:03.1: AER: Correctable error message received from 0000:e1:00.0
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0:   device [10de:1c30] error status/mask=00001000/00000000
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0:    [12] Timeout               

[Sun Jul  6 15:58:12 2025] pcieport 0000:e0:03.1: AER: Correctable error message received from 0000:e1:00.0
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0:   device [10de:1c30] error status/mask=00001000/00000000
[Sun Jul  6 15:58:12 2025] vfio-pci 0000:e1:00.0:    [12] Timeout     

See justkui midagi otseselt halba ei tee, aga kaudselt on siiski kahtlus, et süsteemi stabiilsusele ei ole mõju hea, ravib 'pcie_aspm=off' kasutamine (aspm on Active-state power management ja tundub, et seadmete toite sisemine loksutamine ei ole hea antud juhul)

root@pve-wrx90e:~# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.8.12-11-pve root=ZFS=/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt video=efifb:off,simplefb:off pcie_aspm=off

Võib olla 'pcie_aspm=off' on liialdus ja asjakohasem oleks setpci utiliidiga aspm välja lülitada ühel ja teisel konkreetsel seadmel.

Tundub, et aspm teemaga võib olla seotud arvutis aset leidev teinegi nähtus, st arvutisse kinnitatud neli füüsilist sata-ssd ketast vahel kaovad ära, ja natuke tulevad varsti tagasi nähtavale

# dmesg -T
..
2025-07-04T23:34:03.301753+03:00 pve-wrx90e kernel: [466693.887599] ata4: found unknown device (class 0)
2025-07-04T23:34:03.301765+03:00 pve-wrx90e kernel: [466693.887616] ata4: SATA link down (SStatus 0 SControl 300)
2025-07-04T23:34:07.348752+03:00 pve-wrx90e kernel: [466697.934519] ata1: found unknown device (class 0)
2025-07-04T23:34:07.348764+03:00 pve-wrx90e kernel: [466697.934535] ata1: SATA link down (SStatus 0 SControl 300)
2025-07-04T23:34:08.363760+03:00 pve-wrx90e kernel: [466698.949513] ata4: found unknown device (class 0)
2025-07-04T23:34:08.363771+03:00 pve-wrx90e kernel: [466698.949528] ata4: SATA link down (SStatus 0 SControl 300)
2025-07-04T23:34:08.363772+03:00 pve-wrx90e kernel: [466698.949537] ata4: limiting SATA link speed to <unknown>
2025-07-04T23:34:09.976757+03:00 pve-wrx90e kernel: [466700.562481] ata4: found unknown device (class 0)
2025-07-04T23:34:09.976769+03:00 pve-wrx90e kernel: [466700.562498] ata4: SATA link down (SStatus 0 SControl 3F0)
2025-07-04T23:34:09.976769+03:00 pve-wrx90e kernel: [466700.562505] ata4.00: disable device
2025-07-04T23:34:10.902748+03:00 pve-wrx90e kernel: [466701.488466] ata4: found unknown device (class 0)
2025-07-04T23:34:10.902758+03:00 pve-wrx90e kernel: [466701.488481] ata4: SATA link down (SStatus 0 SControl 300)
2025-07-04T23:34:10.902760+03:00 pve-wrx90e kernel: [466701.488501] ata4.00: detaching (SCSI 3:0:0:0)
2025-07-04T23:34:10.924739+03:00 pve-wrx90e kernel: [466701.510468] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
2025-07-04T23:34:10.924744+03:00 pve-wrx90e kernel: [466701.510491] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driv
erbyte=DRIVER_OK
2025-07-04T23:34:11.105750+03:00 pve-wrx90e kernel: [466701.691733] ata4: limiting SATA link speed to 1.5 Gbps
2025-07-04T23:34:11.961755+03:00 pve-wrx90e kernel: [466702.547447] ata1: found unknown device (class 0)
2025-07-04T23:34:11.961766+03:00 pve-wrx90e kernel: [466702.547468] ata1: SATA link down (SStatus 0 SControl 300)
2025-07-04T23:34:11.961767+03:00 pve-wrx90e kernel: [466702.547475] ata1: limiting SATA link speed to <unknown>
2025-07-04T23:34:15.536741+03:00 pve-wrx90e kernel: [466706.123180] md/raid:md127: Disk failure on sdd, disabling device.
2025-07-04T23:34:15.536751+03:00 pve-wrx90e kernel: [466706.123189] md/raid:md127: Operation continuing on 3 devices.
2025-07-04T23:34:17.275741+03:00 pve-wrx90e kernel: [466707.861373] ata4: link is slow to respond, please be patient (ready=0)
2025-07-04T23:34:18.475742+03:00 pve-wrx90e kernel: [466709.061336] ata4: found unknown device (class 0)
2025-07-04T23:34:18.475753+03:00 pve-wrx90e kernel: [466709.061353] ata4: SATA link down (SStatus 0 SControl 310)
2025-07-04T23:34:18.662736+03:00 pve-wrx90e kernel: [466709.248903] ata4: limiting SATA link speed to 1.5 Gbps
2025-07-04T23:34:19.516745+03:00 pve-wrx90e kernel: [466710.102306] ata1: found unknown device (class 0)
2025-07-04T23:34:19.516755+03:00 pve-wrx90e kernel: [466710.102321] ata1: SATA link down (SStatus 0 SControl 3F0)
2025-07-04T23:34:19.516756+03:00 pve-wrx90e kernel: [466710.102327] ata1.00: disable device
2025-07-04T23:34:19.520143+03:00 pve-wrx90e kernel: [466710.105316] sd 0:0:0:0: rejecting I/O to offline device
2025-07-04T23:34:19.520148+03:00 pve-wrx90e kernel: [466710.105327] I/O error, dev sda, sector 16 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0
2025-07-04T23:34:19.520149+03:00 pve-wrx90e kernel: [466710.105333] md: super_written gets error=-5
2025-07-04T23:34:19.520149+03:00 pve-wrx90e kernel: [466710.105336] md/raid:md127: Disk failure on sda, disabling device.
2025-07-04T23:34:19.520151+03:00 pve-wrx90e kernel: [466710.105340] md/raid:md127: Cannot continue operation (2/4 failed).
...

Kasulikud lisamaterjalid