Konstruktori arvuti - ASUS WRX90E + AMD Threadripper PRO 7965WX
Sissejuhatus
Kõnealuse arvuti tekitamise motivatsioonist moodustab tugeva osa 'et oleks huvitav' komponent
- millised on kaasaegsele riistvara iseloomulikud võimalused
- kuidas saab 2025 aastal Linux operatsioonisüsteemi kasutusjuhtumil riistvara kasutada
- kuidas saab 2025 aastal Proxmox virtualiseerimise plotvormi kasutusjuhtumil riistvara kasutada
Riistvara
Arvuti komponentide koosseis
- emaplaat - ASUS WRX90E SAGE
- protsessor - AMD Threadripper PRO 7965WX
- mälu - 2 x 96 GB
- toiteplokk - seasonic 850 W
- videokaart -
- nvme seade -
Võrreldes tavalise PC platvormiga
- mitte 2, aga 8 memory channel ressurssi
- mitte 24, aga 128 pcie lane ressurssi
- mitte 2-3 m.2 nvme seadet, aga 4
- mitte 3-4 pcie pesa, aga 7
- mitte 1 cpu toide, aga 2 (võimalus kasutada kahte füüsilist toiteplokki)
- korralik iommu eraldatus (nt pcie passthru jaoks)
- korralik pcie bifurcation (nt x16 -> 4-korda-x4 nvme salvestusseadmete kasutamiseks adapterkaardiga)
- kaughalduse liides
- füüsiline com ehk serial port
Võrreldus tavalise server platvormiga
- mitte 2, aga 1 füüsiline cpu soket
- põhimõtteliselt ei pea ühe soket protsessoriga avutite kasutajatele väga kaasa tundma, sest ühe protsessori kujul sama arvutuskorpus (tuumade arv, kiirus, mälu, cache jms) vs kahe protsessori kujul puhul on ühe protsessori eeliseks väiksem nö multi-processor overhead (tegevusi soketite vahel kooskõlastada jms)
Üldine nö PCI topoloogia, 'lstopo --of txt'
kus
- midagi äärmiselt huvitavat ei ole näha kuna tegu on ühe socketiga arvutiga, st ei ole olukorda, et osa pci seadmeid on ühe ja osa teise cpu külge kinnitatud
Protsessor
root@pve-wrx90e:~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. Model name: AMD Ryzen Threadripper PRO 7965WX 24-Cores BIOS Model name: AMD Ryzen Threadripper PRO 7965WX 24-Cores Unknown CPU @ 4.2GHz BIOS CPU family: 107 CPU family: 25 Model: 24 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 1 Stepping: 1 CPU(s) scaling MHz: 18% CPU max MHz: 5362.0000 CPU min MHz: 545.0000 BogoMIPS: 8387.22 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid ex td_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetc h osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms i nvpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk a vx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi a vx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d debug_swap Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 768 KiB (24 instances) L1i: 768 KiB (24 instances) L2: 24 MiB (24 instances) L3: 128 MiB (4 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-47 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; Safe RET Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Srbds: Not affected Tsx async abort: Not affected
kus
- mälu adresseerimiseks on kasutada 'Address sizes: 52 bits physical, 57 bits virtual'
Võrgukaart Intel x710
root@pve-wrx90e:~# lspci | grep -i ether 01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02) 01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02) root@pve-wrx90e:~# driverctl list-devices network 0000:01:00.0 i40e 0000:01:00.1 i40e root@pve-wrx90e:~# devlink dev info pci/0000:01:00.0: driver i40e serial_number 35-56-82-ff-ff-84-cf-60 versions: fixed: board.id 000000-000 running: fw.mgmt 9.140 fw.mgmt.build 76856 fw.mgmt.api 1.15 fw.psid.api 9.40 fw.bundle_id 0x8000efef fw.undi 1.3534.0 pci/0000:01:00.1: driver i40e serial_number 35-56-82-ff-ff-84-cf-60 versions: fixed: board.id 000000-000 running: fw.mgmt 9.140 fw.mgmt.build 76856 fw.mgmt.api 1.15 fw.psid.api 9.40 fw.bundle_id 0x8000efef fw.undi 1.3534.0
ning üksikasjalikum vaade lspci abil
root@pve-wrx90e:~# lspci -vvv | less -N .. 319 01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GBASE-T (rev 02) 320 DeviceName: X710 DUAL 10G LAN1 321 Subsystem: Intel Corporation Ethernet Network Adapter X710-TL 322 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ 323 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 324 Latency: 0, Cache Line Size: 64 bytes 325 Interrupt: pin A routed to IRQ 65 326 IOMMU group: 28 327 Region 0: Memory at 100a1000000 (64-bit, prefetchable) [size=16M] 328 Region 3: Memory at 100a3800000 (64-bit, prefetchable) [size=32K] 329 Expansion ROM at f5780000 [disabled] [size=512K] 330 Capabilities: [40] Power Management version 3 331 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) 332 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- 333 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ 334 Address: 0000000000000000 Data: 0000 335 Masking: 00000000 Pending: 00000000 336 Capabilities: [70] MSI-X: Enable+ Count=129 Masked- 337 Vector table: BAR=3 offset=00000000 338 PBA: BAR=3 offset=00001000 339 Capabilities: [a0] Express (v2) Endpoint, MSI 00 340 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us 341 ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W 342 DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ 343 RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset- 344 MaxPayload 512 bytes, MaxReadReq 512 bytes 345 DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- 346 LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <16us 347 ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ 348 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ 349 ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- 350 LnkSta: Speed 8GT/s, Width x4 351 TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- 352 DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR- 353 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- 354 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- 355 FRS- TPHComp- ExtTPHComp- 356 AtomicOpsCap: 32bit- 64bit- 128bitCAS- 357 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, 358 AtomicOpsCtl: ReqEn- 359 LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- 360 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- 361 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- 362 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot 363 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ 364 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- 365 Retimer- 2Retimers- CrosslinkRes: unsupported 366 Capabilities: [e0] Vital Product Data 367 Product Name: Example VPD 368 Read-only fields: 369 [V0] Vendor specific: 370 [RV] Reserved: checksum good, 0 byte(s) reserved 371 End 372 Capabilities: [100 v2] Advanced Error Reporting 373 UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- 374 UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- 375 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- 376 CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- 377 CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- 378 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+ 379 MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- 380 HeaderLog: 00000000 00000000 00000000 00000000 381 Capabilities: [140 v1] Device Serial Number 35-56-82-ff-ff-84-cf-60 382 Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) 383 ARICap: MFVC- ACS-, Next Function: 1 384 ARICtl: MFVC- ACS-, Function Group: 0 385 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) 386 IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000 387 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ 10BitTagReq- 388 IOVSta: Migration- 389 Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00 390 VF offset: 16, stride: 1, Device ID: 154c 391 Supported Page Size: 00000553, System Page Size: 00000001 392 Region 0: Memory at 00000100a3000000 (64-bit, prefetchable) 393 Region 3: Memory at 00000100a3810000 (64-bit, prefetchable) 394 VF Migration: offset: 00000000, BIR: 0 395 Capabilities: [1a0 v1] Transaction Processing Hints 396 Device specific mode supported 397 No steering table available 398 Capabilities: [1b0 v1] Access Control Services 399 ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- 400 ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- 401 Capabilities: [1d0 v1] Secondary PCI Express 402 LnkCtl3: LnkEquIntrruptEn- PerformEqu- 403 LaneErrStat: 0 404 Kernel driver in use: i40e 405 Kernel modules: i40e
Kaughaldus
Kaughalduse webgui paistab selline
kus
- avatud on inventory osakond
IMPI ehk kaughalduse komponendi firmware uuendamine paistab nii
kus
- TODO
Mingis mõttes saab jõuda arvuti BIOS/UEFI setup juurde (töötava avuti tingimustes)
kus
- TODO
PCIe kasutamine
On kasutada muu hulgas sellised pcie seadmed
root@pve-wrx90e:~# lspci | egrep "e1:00|e0:03" e0:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo Dummy Host Bridge (rev 01) e0:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Genoa/Bergamo GPP Bridge (rev 01) e1:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1) e1:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
ja logisse tekib
root@pve-wrx90e:~# dmesg -T [Sun Jul 6 15:58:11 2025] vfio-pci 0000:e1:00.0: AER: Error of this Agent is reported first [Sun Jul 6 15:58:11 2025] vfio-pci 0000:e1:00.1: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID) [Sun Jul 6 15:58:11 2025] vfio-pci 0000:e1:00.1: device [10de:10f1] error status/mask=00001000/00000000 [Sun Jul 6 15:58:11 2025] vfio-pci 0000:e1:00.1: [12] Timeout [Sun Jul 6 15:58:12 2025] pcieport 0000:e0:03.1: AER: Correctable error message received from 0000:e1:00.0 [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID) [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: device [10de:1c30] error status/mask=00001000/00000000 [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: [12] Timeout [Sun Jul 6 15:58:12 2025] pcieport 0000:e0:03.1: AER: Correctable error message received from 0000:e1:00.0 [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID) [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: device [10de:1c30] error status/mask=00001000/00000000 [Sun Jul 6 15:58:12 2025] vfio-pci 0000:e1:00.0: [12] Timeout
See justkui midagi otseselt halba ei tee, aga kaudselt on siiski kahtlus, et süsteemi stabiilsusele ei ole mõju hea, ravib 'pcie_aspm=off' kasutamine (aspm on Active-state power management ja tundub, et seadmete toite sisemine loksutamine ei ole hea antud juhul)
root@pve-wrx90e:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-6.8.12-11-pve root=ZFS=/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt video=efifb:off,simplefb:off pcie_aspm=off
Võib olla 'pcie_aspm=off' on liialdus ja asjakohasem oleks setpci utiliidiga aspm välja lülitada ühel ja teisel konkreetsel seadmel.
Tundub, et aspm teemaga võib olla seotud arvutis aset leidev teinegi nähtus, st arvutisse kinnitatud neli füüsilist sata-ssd ketast vahel kaovad ära, ja natuke tulevad varsti tagasi nähtavale
# dmesg -T .. 2025-07-04T23:34:03.301753+03:00 pve-wrx90e kernel: [466693.887599] ata4: found unknown device (class 0) 2025-07-04T23:34:03.301765+03:00 pve-wrx90e kernel: [466693.887616] ata4: SATA link down (SStatus 0 SControl 300) 2025-07-04T23:34:07.348752+03:00 pve-wrx90e kernel: [466697.934519] ata1: found unknown device (class 0) 2025-07-04T23:34:07.348764+03:00 pve-wrx90e kernel: [466697.934535] ata1: SATA link down (SStatus 0 SControl 300) 2025-07-04T23:34:08.363760+03:00 pve-wrx90e kernel: [466698.949513] ata4: found unknown device (class 0) 2025-07-04T23:34:08.363771+03:00 pve-wrx90e kernel: [466698.949528] ata4: SATA link down (SStatus 0 SControl 300) 2025-07-04T23:34:08.363772+03:00 pve-wrx90e kernel: [466698.949537] ata4: limiting SATA link speed to <unknown> 2025-07-04T23:34:09.976757+03:00 pve-wrx90e kernel: [466700.562481] ata4: found unknown device (class 0) 2025-07-04T23:34:09.976769+03:00 pve-wrx90e kernel: [466700.562498] ata4: SATA link down (SStatus 0 SControl 3F0) 2025-07-04T23:34:09.976769+03:00 pve-wrx90e kernel: [466700.562505] ata4.00: disable device 2025-07-04T23:34:10.902748+03:00 pve-wrx90e kernel: [466701.488466] ata4: found unknown device (class 0) 2025-07-04T23:34:10.902758+03:00 pve-wrx90e kernel: [466701.488481] ata4: SATA link down (SStatus 0 SControl 300) 2025-07-04T23:34:10.902760+03:00 pve-wrx90e kernel: [466701.488501] ata4.00: detaching (SCSI 3:0:0:0) 2025-07-04T23:34:10.924739+03:00 pve-wrx90e kernel: [466701.510468] sd 3:0:0:0: [sdd] Synchronizing SCSI cache 2025-07-04T23:34:10.924744+03:00 pve-wrx90e kernel: [466701.510491] sd 3:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driv erbyte=DRIVER_OK 2025-07-04T23:34:11.105750+03:00 pve-wrx90e kernel: [466701.691733] ata4: limiting SATA link speed to 1.5 Gbps 2025-07-04T23:34:11.961755+03:00 pve-wrx90e kernel: [466702.547447] ata1: found unknown device (class 0) 2025-07-04T23:34:11.961766+03:00 pve-wrx90e kernel: [466702.547468] ata1: SATA link down (SStatus 0 SControl 300) 2025-07-04T23:34:11.961767+03:00 pve-wrx90e kernel: [466702.547475] ata1: limiting SATA link speed to <unknown> 2025-07-04T23:34:15.536741+03:00 pve-wrx90e kernel: [466706.123180] md/raid:md127: Disk failure on sdd, disabling device. 2025-07-04T23:34:15.536751+03:00 pve-wrx90e kernel: [466706.123189] md/raid:md127: Operation continuing on 3 devices. 2025-07-04T23:34:17.275741+03:00 pve-wrx90e kernel: [466707.861373] ata4: link is slow to respond, please be patient (ready=0) 2025-07-04T23:34:18.475742+03:00 pve-wrx90e kernel: [466709.061336] ata4: found unknown device (class 0) 2025-07-04T23:34:18.475753+03:00 pve-wrx90e kernel: [466709.061353] ata4: SATA link down (SStatus 0 SControl 310) 2025-07-04T23:34:18.662736+03:00 pve-wrx90e kernel: [466709.248903] ata4: limiting SATA link speed to 1.5 Gbps 2025-07-04T23:34:19.516745+03:00 pve-wrx90e kernel: [466710.102306] ata1: found unknown device (class 0) 2025-07-04T23:34:19.516755+03:00 pve-wrx90e kernel: [466710.102321] ata1: SATA link down (SStatus 0 SControl 3F0) 2025-07-04T23:34:19.516756+03:00 pve-wrx90e kernel: [466710.102327] ata1.00: disable device 2025-07-04T23:34:19.520143+03:00 pve-wrx90e kernel: [466710.105316] sd 0:0:0:0: rejecting I/O to offline device 2025-07-04T23:34:19.520148+03:00 pve-wrx90e kernel: [466710.105327] I/O error, dev sda, sector 16 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0 2025-07-04T23:34:19.520149+03:00 pve-wrx90e kernel: [466710.105333] md: super_written gets error=-5 2025-07-04T23:34:19.520149+03:00 pve-wrx90e kernel: [466710.105336] md/raid:md127: Disk failure on sda, disabling device. 2025-07-04T23:34:19.520151+03:00 pve-wrx90e kernel: [466710.105340] md/raid:md127: Cannot continue operation (2/4 failed). ...