1.
Mellanox의 VMA를 설치하려고 이런저런 작업을 진행하였습니다. 저는 차근차근 문서를 읽어보고 정리를 한 다음 머리속으로 작업순서를 그리고 진행하는 방식이 아니라 무엇을 하고 장벽에 부딪히면 찾아보고 다시 하는 좌충우돌같은 방식으로 일합니다. 이번에 문제 해결식으로 일을 하다 보니까 불필요한 시간을 낭비하지 않았나 후회를 하기도 했습니다. 그렇지만 처음 접한 과제를 놓고 어떤 문서를 읽어야 할지 캄캄한 상태에서 어쩔 수 없는 선택이었다고 위안을 해봅니다.
VMA를 설치하면서 RDMA와 관련한 라이브러리를 자주 접하였습니다. 왜 그런가 찾아보니까 제가 알던 TCP Bypass와 전혀 다른 구조를 가지고 있네요. Infiniband의 기반기술을 이용하고 있습니다. 몰랐던 부분입니다.
VMA은 Voltaire가 처음으로 개발하여 상용으로 판매하였던 솔류션입니다. 2011년 Mellanox가 인수한 후 2013년 오픈소스로 공개하였네요. Chelsio와 Solarflare가 10G시장을 놓고 경쟁하던 때에 Mellanox가 경쟁력을 강화하려고 인수한 듯 합니다. 2011년 Onload, SDP(Sockets Direct Protocol), VMA 및 Infiniband를 비교한 자료입니다. OpenOnLoad와 VMA의 차이점이 바로 Verbs에 있었습니다.
Verbs An abstract description of the functionality of a network adapter. Using the verbs, any application can create / manage objects that are needed in order to use RDMA for data transfer.
Verbs is an abstract description of the functionality that is provided for applications forusing RDMA.
And known, that both SDP and VMA use RDMA-Verbs and can be used for already compiled binary of program as (libpreload) LD_PRELOAD: As with Openonload, SDP and Mellanox’s VMA all preload to accelerate an existing TCP/IP socket program. Openonload retains the TCP/IP protocol so can be used single ended. SDP and VMA both map to VERBS so must be deployed on both ends of the wire.
What is the difference between SDP and VMA?중에서
그리면 Mellanox가 소개하는 VMA구조입니다. Voltaire의 자료와 동일합니다.
Top-Level
The VMA library is a dynamically linked user-space library. Use of the VMA library does not require any code changes or recompiling of user applications. Instead, it is dynamically loaded via the Linux OS environment variable, LD_PRELOAD. However, it is possible to load VMA library dynamically without using the LD_PRELOAD parameter, which requires minor application modifications.
When a user application transmits TCP and UDP, unicast and multicast IPv4 data, or listens for such network traffic data, the VMA library:
Intercepts the socket receive and send calls made to the stream socket or datagram socket address families.
Implements the underlying work in user space (instead of allowing the buffers to pass on to the usual OS network kernel libraries).VMA implements native RDMA verbs API. The native RDMA verbs have been extended into the Ethernet RDMA-capable NICs, enabling the packets to pass directly between the user application and the InfiniBand HCA or Ethernet NIC, bypassing the kernel and its TCP/UDP handling network stack.
You can implement the code in native RDMA verbs API, without making any changes to your applications. The VMA library does all the heavy lifting under the hood, while transparently presenting the same standard socket API to the application, thus redirecting the data flow.
The VMA library operates in a standard networking stack fashion to serve multiple network interfaces.
The VMA library behaves according to the way the application calls the bind, connect, and setsockopt directives and the administrator sets the route lookup to determine the interface to be used for the socket traffic. The library knows whether data is passing to or from an InfiniBand HCA or Ethernet NIC. If the data is passing to/from a supported HCA or Ethernet NIC, the VMA library intercepts the call and does the bypass work. If the data is passing to/from an unsupported HCA or Ethernet NIC, the VMA library passes the call to the usual kernel libraries responsible for handling network traffic. Thus, the same application can listen in on multiple HCAs or Ethernet NICs, without requiring any configuration changes for the hybrid environment.
2.
개론은 끝내고 본론입니다. 당연히 VMA를 지원하는 어댑터가 있어야 합니다. 이와 관련하여 Adpator Driver를 다운로드하여야 합니다. 위 개론을 주절주절 적어놓은 이유는 드라이버가 두 종류이기때문입니다.
Mellanox EN Driver for Linux
Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED)
사실 둘 중 어느 것을 설치하여도 무방합니다. 저는 처음에 MLNX_OFED를 설치한 후 Port Type을 Ethernet으로 설정하여 사용하려고 했습니다. 그런데 뜻대로 되지 않아서 MLNX_EN을 설치하였습니다. OS에 맞는 프로그램을 다운받아서 압축을 풀면 설치와 관련한 shell인 install을 실행합니다. Kernel버전에 따라 아래와 같은 메시지를 접하는 경우가 많습니다. 찬찬히 읽어보고 하라는 대로 하면 됩니다.(^^)
1 2 3 4 5 6 7 8 9 10 11 |
zeroaos@am0n:~/mlnx-en-5.3-1.0.0.1-rhel8.3-x86_64$ sudo ./install Logs dir: /tmp/mlnx-en.977526.logs General log file: /tmp/mlnx-en.977526.logs/general.log Verifying KMP rpms compatibility with target kernel... The kernel KMP rpms coming with mlnx-en are not compatible with kernel: 4.18.0-240.22.1.el8_3.x86_64 See log at /tmp/mlnx-en.977526.logs/is_kmp_compat_check.log The 4.18.0-240.22.1.el8_3.x86_64 kernel is installed, mlnx-en does not have drivers available for this kernel. You can run mlnx_add_kernel_support.sh in order to to generate an mlnx-en package with drivers for this kernel. Or, you can provide '--add-kernel-support' flag to generate an mlnx-en package and automatically start the installation. |
mlnx_add_kernel_support.sh 을 실행해보니까 기대한 대로 동작하지 않아서 옵션을 추가하였습니다. 그리고 VMA를 사용하려고 할 경우에는 “–vma”를 추가합니다. 이렇게 설치명령을 한 결과입니다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
zeroaos@am0n:~/mlnx-en-5.3-1.0.0.1-rhel8.3-x86_64$ sudo ./install --add-kernel-support --vma Note: This program will create mlnx-en TGZ for rhel8.3 under /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64 directory. See log file /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64/mlnx_iso.92995_logs/mlnx_ofed_iso.92995.log Checking if all needed packages are installed... Building mlnx-en RPMS . Please wait... ^CFailed to build mlnx-en for 4.18.0-240.22.1.el8_3.x86_64 zeroaos@am0n:~/mlnx-en-5.3-1.0.0.1-rhel8.3-x86_64$ sudo ./install --add-kernel-support --vma Note: This program will create mlnx-en TGZ for rhel8.3 under /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64 directory. See log file /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64/mlnx_iso.96575_logs/mlnx_ofed_iso.96575.log Checking if all needed packages are installed... Building mlnx-en RPMS . Please wait... ^[[6~Creating metadata-rpms for 4.18.0-240.22.1.el8_3.x86_64 ... WARNING: If you are going to configure this package as a repository, then please note WARNING: that it contains unsigned rpms, therefore, you need to disable the gpgcheck WARNING: by setting 'gpgcheck=0' in the repository conf file. Created /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64/mlnx-en-5.3-1.0.0.1-rhel8.3-ext.tgz Uninstalling the previous version of mlnx-en Installing /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64/mlnx-en-5.3-1.0.0.1-rhel8.3-ext /tmp/mlnx-en-5.3-1.0.0.1-4.18.0-240.22.1.el8_3.x86_64/mlnx-en-5.3-1.0.0.1-rhel8.3-ext/install --force --vma Logs dir: /tmp/mlnx-en.954555.logs General log file: /tmp/mlnx-en.954555.logs/general.log This program will install the mlnx-en package on your machine. Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed. Those packages are removed due to conflicts with mlnx-en, do not reinstall them. Starting mlnx-en-5.3-1.0.0.1 installation ... Installing mlnx-ofa_kernel 5.3 RPM Verifying... ######################################## Preparing... ######################################## Updating / installing... mlnx-ofa_kernel-5.3-OFED.5.3.1.0.0.1.r######################################## Installing mlnx-ofa_kernel-modules 5.3 RPM Verifying... ######################################## Preparing... ######################################## Updating / installing... mlnx-ofa_kernel-modules-5.3-OFED.5.3.1######################################## Installing mlnx-ofa_kernel-devel 5.3 RPM Verifying... ######################################## Preparing... ######################################## Updating / installing... mlnx-ofa_kernel-devel-5.3-OFED.5.3.1.0######################################## Installing user level RPMs: Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Verifying... ######################################## Preparing... ######################################## Device (04:00.0): 04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Link Width: x4 ( WARNING - device supports x8 ) PCI Link Speed: 8GT/s Device (04:00.1): 04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Link Width: x4 ( WARNING - device supports x8 ) PCI Link Speed: 8GT/s Device (0c:00.0): 0c:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Link Width: x8 PCI Link Speed: 8GT/s Device (0c:00.1): 0c:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] Link Width: x8 PCI Link Speed: 8GT/s Installation finished successfully. Verifying... ################################# [100%] Preparing... ################################# [100%] Updating / installing... 1:mlnx-fw-updater-5.3-1.0.0.1 ################################# [100%] Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf Initializing... Attempting to perform Firmware update... Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX512A-ACA_Ax_Bx Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6 PSID: MT_0000000080 PCI Device Name: 04:00.0 Base GUID: 1c34da030065609c Base MAC: 1c34da65609c Versions: Current Available FW 16.30.1004 16.30.1004 PXE 3.6.0301 3.6.0301 UEFI 14.23.0017 14.23.0017 Status: Up to date Log File: /tmp/YVc09P1RQD Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX5 Part Number: MCX512A-ACA_Ax_Bx Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6 PSID: MT_0000000080 PCI Device Name: 0c:00.0 Base GUID: 1c34da0300656094 Base MAC: 1c34da656094 Versions: Current Available FW 16.30.1004 16.30.1004 PXE 3.6.0301 3.6.0301 UEFI 14.23.0017 14.23.0017 Status: Up to date Log File: /tmp/gcrwnFasH_ Real log file: /tmp/mlnx-en.954555.logs/fw_update.log WARNING: Original /etc/infiniband/openib.conf saved as /etc/infiniband/openib.conf.rpmsave You may need to update your initramfs before next boot. To do that, run: dracut -f To load the new driver, run: /etc/init.d/openibd restart |
어떤 드라이버가 설치되었는지 확인을 하였습니다.
1 2 3 4 5 6 7 8 9 10 |
[root@am0n ~]# lsmod | grep mlx mlx5_ib 421888 0 mlx5_core 1642496 1 mlx5_ib ib_uverbs 155648 2 rdma_ucm,mlx5_ib ib_core 425984 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm mlx_compat 16384 10 rdma_cm,ib_ipoib,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core tls 102400 1 mlx5_core mlxfw 28672 1 mlx5_core psample 20480 1 mlx5_core pci_hyperv_intf 16384 1 mlx5_core |
다음으로 어댑터의 포트가 어떻게 설정되었는지를 확인했습니다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 |
[root@am0n ~]# lspci | grep Mellanox 04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 0c:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 0c:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] [root@am0n ~]# mlxconfig -d 04:00.0 query Device #1: ---------- Device type: ConnectX5 Name: MCX512A-ACA_Ax_Bx Description: ConnectX-5 EN network interface card; 10/25GbE dual-port SFP28; PCIe3.0 x8; tall bracket; ROHS R6 Device: 04:00.0 Configurations: Next Boot MEMIC_BAR_SIZE 0 MEMIC_SIZE_LIMIT _256KB(1) HOST_CHAINING_MODE DISABLED(0) HOST_CHAINING_CACHE_DISABLE False(0) HOST_CHAINING_DESCRIPTORS Array[0..7] HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7] FLEX_PARSER_PROFILE_ENABLE 0 FLEX_IPV4_OVER_VXLAN_PORT 0 ROCE_NEXT_PROTOCOL 254 ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7] ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7] PF_BAR2_SIZE 0 NON_PREFETCHABLE_PF_BAR False(0) VF_VPD_ENABLE False(0) PER_PF_NUM_SF False(0) STRICT_VF_MSIX_NUM False(0) VF_NODNIC_ENABLE False(0) NUM_PF_MSIX_VALID True(1) NUM_OF_VFS 8 PF_BAR2_ENABLE False(0) SRIOV_EN True(1) PF_LOG_BAR_SIZE 5 VF_LOG_BAR_SIZE 0 NUM_PF_MSIX 63 NUM_VF_MSIX 11 INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0) PCIE_CREDIT_TOKEN_TIMEOUT 0 ACCURATE_TX_SCHEDULER False(0) PARTIAL_RESET_EN False(0) SW_RECOVERY_ON_ERRORS False(0) RESET_WITH_HOST_ON_ERRORS False(0) ADVANCED_POWER_SETTINGS False(0) CQE_COMPRESSION BALANCED(0) IP_OVER_VXLAN_EN False(0) MKEY_BY_NAME False(0) ESWITCH_IPV4_TTL_MODIFY_ENABLE False(0) PRIO_TAG_REQUIRED_EN False(0) UCTX_EN True(1) PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) TUNNEL_ECN_COPY_DISABLE False(0) LRO_LOG_TIMEOUT0 6 LRO_LOG_TIMEOUT1 7 LRO_LOG_TIMEOUT2 8 LRO_LOG_TIMEOUT3 13 LOG_TX_PSN_WINDOW 7 LOG_MAX_OUTSTANDING_WQE 7 ICM_CACHE_MODE DEVICE_DEFAULT(0) TX_SCHEDULER_BURST 0 ZERO_TOUCH_TUNING_ENABLE False(0) LOG_DCR_HASH_TABLE_SIZE 11 DCR_LIFO_SIZE 16384 ROCE_CC_PRIO_MASK_P1 255 ROCE_CC_PRIO_MASK_P2 255 CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1) CLAMP_TGT_RATE_P1 False(0) RPG_TIME_RESET_P1 300 RPG_BYTE_RESET_P1 32767 RPG_THRESHOLD_P1 1 RPG_MAX_RATE_P1 0 RPG_AI_RATE_P1 5 RPG_HAI_RATE_P1 50 RPG_GD_P1 11 RPG_MIN_DEC_FAC_P1 50 RPG_MIN_RATE_P1 1 RATE_TO_SET_ON_FIRST_CNP_P1 0 DCE_TCP_G_P1 1019 DCE_TCP_RTT_P1 1 RATE_REDUCE_MONITOR_PERIOD_P1 4 INITIAL_ALPHA_VALUE_P1 1023 MIN_TIME_BETWEEN_CNPS_P1 4 CNP_802P_PRIO_P1 6 CNP_DSCP_P1 48 CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1) CLAMP_TGT_RATE_P2 False(0) RPG_TIME_RESET_P2 300 RPG_BYTE_RESET_P2 32767 RPG_THRESHOLD_P2 1 RPG_MAX_RATE_P2 0 RPG_AI_RATE_P2 5 RPG_HAI_RATE_P2 50 RPG_GD_P2 11 RPG_MIN_DEC_FAC_P2 50 RPG_MIN_RATE_P2 1 RATE_TO_SET_ON_FIRST_CNP_P2 0 DCE_TCP_G_P2 1019 DCE_TCP_RTT_P2 1 RATE_REDUCE_MONITOR_PERIOD_P2 4 INITIAL_ALPHA_VALUE_P2 1023 MIN_TIME_BETWEEN_CNPS_P2 4 CNP_802P_PRIO_P2 6 CNP_DSCP_P2 48 LLDP_NB_DCBX_P1 False(0) LLDP_NB_RX_MODE_P1 OFF(0) LLDP_NB_TX_MODE_P1 OFF(0) LLDP_NB_DCBX_P2 False(0) LLDP_NB_RX_MODE_P2 OFF(0) LLDP_NB_TX_MODE_P2 OFF(0) DCBX_IEEE_P1 True(1) DCBX_CEE_P1 True(1) DCBX_WILLING_P1 True(1) DCBX_IEEE_P2 True(1) DCBX_CEE_P2 True(1) DCBX_WILLING_P2 True(1) KEEP_ETH_LINK_UP_P1 True(1) KEEP_IB_LINK_UP_P1 False(0) KEEP_LINK_UP_ON_BOOT_P1 True(1) KEEP_LINK_UP_ON_STANDBY_P1 False(0) DO_NOT_CLEAR_PORT_STATS_P1 False(0) AUTO_POWER_SAVE_LINK_DOWN_P1 False(0) KEEP_ETH_LINK_UP_P2 True(1) KEEP_IB_LINK_UP_P2 False(0) KEEP_LINK_UP_ON_BOOT_P2 True(1) KEEP_LINK_UP_ON_STANDBY_P2 False(0) DO_NOT_CLEAR_PORT_STATS_P2 False(0) AUTO_POWER_SAVE_LINK_DOWN_P2 False(0) NUM_OF_VL_P1 _4_VLs(3) NUM_OF_TC_P1 _8_TCs(0) NUM_OF_PFC_P1 8 VL15_BUFFER_SIZE_P1 0 NUM_OF_VL_P2 _4_VLs(3) NUM_OF_TC_P2 _8_TCs(0) NUM_OF_PFC_P2 8 VL15_BUFFER_SIZE_P2 0 DUP_MAC_ACTION_P1 LAST_CFG(0) UNKNOWN_UPLINK_MAC_FLOOD_P1 False(0) SRIOV_IB_ROUTING_MODE_P1 LID(1) IB_ROUTING_MODE_P1 LID(1) DUP_MAC_ACTION_P2 LAST_CFG(0) UNKNOWN_UPLINK_MAC_FLOOD_P2 False(0) SRIOV_IB_ROUTING_MODE_P2 LID(1) IB_ROUTING_MODE_P2 LID(1) PF_TOTAL_SF 0 PF_SF_BAR_SIZE 0 PCI_WR_ORDERING per_mkey(0) MULTI_PORT_VHCA_EN False(0) PORT_OWNER True(1) ALLOW_RD_COUNTERS True(1) RENEG_ON_CHANGE True(1) TRACER_ENABLE True(1) IP_VER IPv4(0) BOOT_UNDI_NETWORK_WAIT 0 UEFI_HII_EN True(1) BOOT_DBG_LOG False(0) UEFI_LOGS DISABLED(0) BOOT_VLAN 1 LEGACY_BOOT_PROTOCOL PXE(1) BOOT_RETRY_CNT NONE(0) BOOT_INTERRUPT_DIS False(0) BOOT_LACP_DIS True(1) BOOT_VLAN_EN False(0) BOOT_PKEY 0 P2P_ORDERING_MODE DEVICE_DEFAULT(0) ATS_ENABLED False(0) DYNAMIC_VF_MSIX_TABLE False(0) EXP_ROM_UEFI_x86_ENABLE False(0) EXP_ROM_PXE_ENABLE True(1) ADVANCED_PCI_SETTINGS False(0) SAFE_MODE_THRESHOLD 10 SAFE_MODE_ENABLE True(1) |
3.
이제 libvma.so를 사용할 때입니다. root 계정으로 실행해보니까 정상적으로 sockperf를 실행합니다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
[root@am0n ~]# LD_PRELOAD=libvma.so VMA_SPEC=latency VMA_THREAD_MODE=0 VMA_INTERNAL_THREAD_AFFINITY=2 VMA_RX_POLL=-1 sockperf server -i 127.0.0.1 VMA INFO: --------------------------------------------------------------------------- VMA INFO: VMA_VERSION: 9.2.2-1 Release built on Mar 29 2021 12:41:56 VMA INFO: Cmd Line: sockperf server -i 127.0.0.1 VMA INFO: OFED Version: mlnx-en-5.3-1.0.0.1: VMA INFO: --------------------------------------------------------------------------- VMA INFO: VMA Spec Latency [VMA_SPEC] VMA INFO: Log Level INFO [VMA_TRACELEVEL] VMA INFO: Ring On Device Memory TX 16384 [VMA_RING_DEV_MEM_TX] VMA INFO: Tx QP WRE 256 [VMA_TX_WRE] VMA INFO: Tx QP WRE Batching 4 [VMA_TX_WRE_BATCHING] VMA INFO: Rx QP WRE 256 [VMA_RX_WRE] VMA INFO: Rx QP WRE Batching 4 [VMA_RX_WRE_BATCHING] VMA INFO: Rx Poll Loops -1 [VMA_RX_POLL] VMA INFO: Rx Prefetch Bytes Before Poll 256 [VMA_RX_PREFETCH_BYTES_BEFORE_POLL] VMA INFO: GRO max streams 0 [VMA_GRO_STREAMS_MAX] VMA INFO: Select Poll (usec) -1 [VMA_SELECT_POLL] VMA INFO: Select Poll OS Force Enabled [VMA_SELECT_POLL_OS_FORCE] VMA INFO: Select Poll OS Ratio 1 [VMA_SELECT_POLL_OS_RATIO] VMA INFO: Select Skip OS 1 [VMA_SELECT_SKIP_OS] VMA INFO: CQ Drain Interval (msec) 100 [VMA_PROGRESS_ENGINE_INTERVAL] VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE] VMA INFO: CQ AIM Max Count 128 [VMA_CQ_AIM_MAX_COUNT] VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC] VMA INFO: CQ Keeps QP Full Disabled [VMA_CQ_KEEP_QP_FULL] VMA INFO: TCP nodelay 1 [VMA_TCP_NODELAY] VMA INFO: Avoid sys-calls on tcp fd Enabled [VMA_AVOID_SYS_CALLS_ON_TCP_FD] VMA INFO: Internal Thread Affinity 2 [VMA_INTERNAL_THREAD_AFFINITY] VMA INFO: Thread mode Single [VMA_THREAD_MODE] VMA INFO: --------------------------------------------------------------------------- VMA WARNING: ************************************************************** VMA WARNING: * NO IMMEDIATE ACTION NEEDED! VMA WARNING: * Not enough hugepage resources for VMA memory allocation. VMA WARNING: * VMA will continue working with regular memory allocation. VMA INFO: * Optional: VMA INFO: * 1. Switch to a different memory allocation type VMA INFO: * (VMA_MEM_ALLOC_TYPE!= 2) VMA INFO: * 2. Restart process after increasing the number of VMA INFO: * hugepages resources in the system: VMA INFO: * "echo 1000000000 > /proc/sys/kernel/shmmax" VMA INFO: * "echo 800 > /proc/sys/vm/nr_hugepages" VMA WARNING: * Please refer to the memory allocation section in the VMA's VMA WARNING: * User Manual for more information VMA WARNING: ************************************************************** sockperf: == version #3.7-no.git == sockperf: [SERVER] listen on: [ 0] IP = 127.0.0.1 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 978664] using recvfrom() to block on socket(s) |
사용하려는 프로그램이 zeroaos이기 때문에 일반계정으로 같은 명령어를 실행하였습니다. 예상외의 결과값이 나왔습니다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
[zeroaos@am0n ~]#LD_PRELOAD=libvma.so sockperf server -i 127.0.0.1 VMA INFO: --------------------------------------------------------------------------- VMA INFO: VMA_VERSION: 9.2.2-1 Release built on Mar 29 2021 12:41:56 VMA INFO: Cmd Line: sockperf server -i 127.0.0.1 VMA INFO: OFED Version: mlnx-en-5.3-1.0.0.1: VMA INFO: --------------------------------------------------------------------------- VMA INFO: Log Level INFO [VMA_TRACELEVEL] VMA INFO: --------------------------------------------------------------------------- VMA WARNING: ******************************************************************************************************* VMA WARNING: * Interface enp4s0f0 will not be offloaded. VMA WARNING: * Offloaded resources are restricted to root or user with CAP_NET_RAW privileges VMA WARNING: * Read the CAP_NET_RAW and root access section in the VMA's User Manual for more information VMA WARNING: ******************************************************************************************************* VMA WARNING: ******************************************************************************************************* VMA WARNING: * Interface enp12s0f0 will not be offloaded. VMA WARNING: * Offloaded resources are restricted to root or user with CAP_NET_RAW privileges VMA WARNING: * Read the CAP_NET_RAW and root access section in the VMA's User Manual for more information VMA WARNING: ******************************************************************************************************* VMA WARNING: ******************************************************************************************************* VMA WARNING: * Interface enp12s0f1 will not be offloaded. VMA WARNING: * Offloaded resources are restricted to root or user with CAP_NET_RAW privileges VMA WARNING: * Read the CAP_NET_RAW and root access section in the VMA's User Manual for more information VMA WARNING: ******************************************************************************************************* VMA WARNING: ************************************************************** VMA WARNING: * NO IMMEDIATE ACTION NEEDED! VMA WARNING: * Not enough hugepage resources for VMA memory allocation. VMA WARNING: * VMA will continue working with regular memory allocation. VMA INFO: * Optional: VMA INFO: * 1. Switch to a different memory allocation type VMA INFO: * (VMA_MEM_ALLOC_TYPE!= 2) VMA INFO: * 2. Restart process after increasing the number of VMA INFO: * hugepages resources in the system: VMA INFO: * "echo 1000000000 > /proc/sys/kernel/shmmax" VMA INFO: * "echo 800 > /proc/sys/vm/nr_hugepages" VMA WARNING: * Please refer to the memory allocation section in the VMA's VMA WARNING: * User Manual for more information VMA WARNING: ************************************************************** sockperf: == version #3.7-no.git == sockperf: [SERVER] listen on: [ 0] IP = 127.0.0.1 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 974709] using recvfrom() to block on socket(s) |
아래 때문에 정상적으로 동작하지 않았습니다.
Offloaded resources are restricted to root or user with CAP_NET_RAW privileges
Mellanox의 문서를 보니까 아래의 문장이 있네요.
Option disable_raw_qp_enforcement is not supported in MLNX_OFED v5.1 and later, thus, libvma should have CAP_NET_RAW privileges to be used
좀더 잧아보니까 관련한 해결책이 있네요. VMA over RHEL 7.x with inbox driver의 일부분입니다.
Load libvma and run the app (as root): LD_PRELOAD=libvma.so
For running as user (as non root user): Set cap_net_raw for the executable. For example to use sockperf:
setcap cap_net_raw=ep /usr/bin/sockperf
Set special permission for library (SET_UID) and place in standard location (which libvma is now already). These steps are required when using LD_PRELOAD with capabilities set on executable.
chmod u+s /usr/lib64/libvma.so.8*
위의 명령어를 실행한 후 일반계정으로 실행하니까 원했던 결과가 나왔습니다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
zeroaos@am0n:/usr/bin$ LD_PRELOAD=libvma.so sockperf server -i 127.0.0.1 VMA INFO: --------------------------------------------------------------------------- VMA INFO: VMA_VERSION: 9.2.2-1 Release built on Mar 29 2021 12:41:56 VMA INFO: Cmd Line: sockperf server -i 127.0.0.1 VMA INFO: OFED Version: mlnx-en-5.3-1.0.0.1: VMA INFO: --------------------------------------------------------------------------- VMA INFO: Log Level INFO [VMA_TRACELEVEL] VMA INFO: --------------------------------------------------------------------------- VMA WARNING: ************************************************************** VMA WARNING: * NO IMMEDIATE ACTION NEEDED! VMA WARNING: * Not enough hugepage resources for VMA memory allocation. VMA WARNING: * VMA will continue working with regular memory allocation. VMA INFO: * Optional: VMA INFO: * 1. Switch to a different memory allocation type VMA INFO: * (VMA_MEM_ALLOC_TYPE!= 2) VMA INFO: * 2. Restart process after increasing the number of VMA INFO: * hugepages resources in the system: VMA INFO: * "echo 1000000000 > /proc/sys/kernel/shmmax" VMA INFO: * "echo 800 > /proc/sys/vm/nr_hugepages" VMA WARNING: * Please refer to the memory allocation section in the VMA's VMA WARNING: * User Manual for more information VMA WARNING: ************************************************************** sockperf: == version #3.7-no.git == sockperf: [SERVER] listen on: [ 0] IP = 127.0.0.1 PORT = 11111 # UDP sockperf: Warmup stage (sending a few dummy messages)... sockperf: [tid 974880] using recvfrom() to block on socket(s) |
앞서 CAP_NET_RAW과 관련한 문제외 다른 문제도 있었습니다.
1 2 3 4 5 6 7 8 9 10 11 12 |
zeroaos@am0n:~$ VMA_RX_POLL=-1 LD_PRELOAD=libvma.so VMA_SPEC=latenc sockperf server -i 127.0.0.1 VMA INFO: --------------------------------------------------------------------------- VMA INFO: VMA_VERSION: 9.2.2-1 Release built on Mar 29 2021 12:41:56 VMA INFO: Cmd Line: taskset -c 10 zts VMA INFO: OFED Version: mlnx-en-5.3-1.0.0.1: VMA INFO: --------------------------------------------------------------------------- VMA INFO: Log Level INFO [VMA_TRACELEVEL] VMA INFO: Rx Poll Loops -1 [VMA_RX_POLL] VMA INFO: CQ Interrupts Moderation Disabled [VMA_CQ_MODERATION_ENABLE] VMA INFO: CQ Adaptive Moderation Disabled [VMA_CQ_AIM_INTERVAL_MSEC] VMA INFO: --------------------------------------------------------------------------- ERROR: ld.so: object 'libvma.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. |
사실 제일 어려웠던 문제였습니다. LD_PRELOAD와 관련한 검색이 제시한 해결책을 사용했지만 원하는 결과는 없습니다. 다시금 메뉴얼을 보니까 아래의 문장이 있었습니다.
Issue # 2: On running an application with VMA, the following error is reported:
ERROR: ld.so: object ‘libvma.so’ from LD_PRELOAD cannot be preloaded: ignored.
Solution: Check that libvma is properly installed, and that libvma.so is located in /usr/lib (or in /usr/lib64, for 64-bit machines)
정상적으로 설치하였던 “아니다”라고 프로그램이 말해서 혹시나 하는 마음으로 zeroaos계정으로 설치했습니다. 무슨 차이인지 모르지만 위 증상이 보이지 않았습니다.이상이 설치와 실행하면서 접한 시행착오들입니다. 도움이 되시길 바랍니다.
마지막으로 혹시나 하는 마음으로 Infiniband 프로그래밍과 관련한 기록을 남겨놓습니다.
InfiniBand 프로그래밍에 필요한 기본 개념
InfiniBand: An Introduction + Simple IB verbs program with RDMA Write
저렴한 ConnectX-3 에서 해보려니까 힘드네요. onload 쓰던 solarflare 1/10 가격이라 해 보는 중인데 과연..
오랜만입니다.. 잘 될 겁니다…^^