[Broadcom]VMware ESXTOP(1)-설명

밍사원 2024. 8. 2. 12:30

2024. 8. 2. 12:30

728x90

ESXTOP이란?

ESXi Host 의 CUP, Memory, disk, network 사용률과 트래픽 정보 수집을 통한 성능 모니터링 도구

리프레쉬 타임 변경
s <time>
기본 리프레쉬 타임: 5초

메뉴 변경
c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device
v = disk VM
p = power mgmt
x = vsan2 = highlight a row, moving down
8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world
V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5

필드 추가/제거
f
<type appropriate character>
필드 순서 변경
o
<move field by typing appropriate character uppercase = left, lowercase = right>
변경한 모든 설정 저장
W기본구성파일(esxtop4rc)
파일 이름을 변경하지 않으면 파일 이름이 저장되어 기본 설정으로 사용됨
도움말
?
ESXTOP 캡처 결과 내보내기
엑셀: esxtop -b -d 2 -n 100 > esxtopcapture.csv
압축파일: esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz

옵션 설명
-b : 일괄 처리 모드(배치 모드)-n 100 : 반복횟수(즉, 100회 반복하여 캡처한 것)
-a : 모든 메트릭을 기록하는 옵션
-d 2 : 지연시간(즉, 2초마다 리프레쉬한 것 기록)

파라미터 설명
주요 메트릭: %RDY, %CSTP, GAVG, %DRPTX, %DRPRX

Display	Metric	Threshold 임계값	Explanation 설명
CPU	%RDY	10	Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. Note that you will need to expand the VM Group to see how this is distributed across vCPUs. If you have many vCPUs than per vCPU may be low and this may not be an issue. 10% is per world! vCPU의 과도한 프로비저닝, vSMP의 과도한 사용 또는 한도가 설정됨
CPU	%CSTP	3	Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities. vSMP의 과도한 사용(이 특정 VM에 대한 vCPU의 양을 줄일 것)
CPU	%MLMTD	0	The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU. vCPU를 실행할 준비가 되었지만 의도적으로 vCPU가 CPU 제한 설정을 위반할 수 있으므로 예약되지 않은 시간의 비율(0보다 크면 CPU 한계로 인해 제한됨)
CPU	%SWPWT	5	VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment. VM이 스왑된 페이지를 디스크에서 읽도록 대기중(메모리 초과 사용이 원인일 수 있음)
MEM	MCTLSZ	1	If larger than 0 hosts is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited. 0보다 큰 경우, 호스트가 오버커밋됨에 따라 원인이 된 VM의 메모리를 강제로 회수
MEM	SWCUR	1	If larger than 0 hosts has swapped memory pages in the past. Possible cause: Overcommitment. 0보다 큰 경우, 호스트가 과거에 메모리가 스왑한 메모리(메모리 오버커밋이 원인일 수 있음)
MEM	SWR/s	1	If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment. 0보다 큰 경우, 호스트가 스왑 메모리를 읽음(메모리 오버커밋이 원인일 수 있음)
MEM	SWW/s	1	If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment. 0보다 큰 경우, 호스트가 스왑 메모리에 쓰기 작업(메모리 오버커밋이 원인일 수 있음)
MEM	CACHEUSD	0	If larger than 0 hosts has compressed memory. Possible cause: Memory overcommitment. 0보다 큰 경우, 호스트가 압축된 메모리를 가짐(메모리 오버커밋이 원인일 수 있음)
MEM	ZIP/s	0	If larger than 0 hosts is actively compressing memory. Possible cause: Memory overcommitment. 0보다 큰 경우, 호스트가 메모리를 압축 (메모리 오버커밋이 원인일 수 있음)
MEM	UNZIP/s	0	If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory. 0보다 큰 경우, 호스트가 압축된 메모리에 접근(이전에 호스트가 메모리 오버커밋된 것이 원인일 수 있음))
MEM	N%L	80	If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used. 80보다 작은 경우, VM은 부족한 NUMA 지역성을 경험함. 만약 VM이 각 프로세서의 로컬의 메모리의 양보다 큰 메모리를 가질 경우, ESX 스케줄러는 해당 VM에 대해 NUMA 최적화를 사용하려하지 않으며 원격으로 상호작용을 통해 메모리를 사용한다. 사용된 NUMA 노드를 찾으려면 GST_ND(X)를 확인하라.
NETWORK	%DRPTX	1	Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization 손실된 패킷 전송, 하드웨어 과로(너무 높은 네트워크 사용률이 원인일 수 있음)
NETWORK	%DRPRX	1	Dropped packets received, hardware overworked. Possible cause: very high network utilization 손실된 패킷 수신, 하드웨어 과로(너무 높은 네트워크 사용률이 원인일 수 있음)
DISK	GAVG	25	Look at “DAVG” and “KAVG” as the sum of both is GAVG. DAVG와 KAVG의 합계는 GAVG임
DISK	DAVG	25	Disk latency most likely to be caused by the array. 스토리지 배열로 인해 디스크 대기 시간이 발생할 가능성이 큼
DISK	KAVG	2	Disk latency caused by the VMkernel, high KAVG usually means queuing. This is the ESXi storage stack, the vSCSI layer and the VMM. Check “QUED”. VMkernel에 의해 발생하는 디스크 지연인 높은 KAVG는 큐잉을 의미한다. 이것은 ESXi 스토리지 스택, vSCSI 레이어 및 VMM이다. QUED를 확인하라.
DISK	QUED	1	Queue maxed out. Possibly queue depth set to low, or controller overloaded. Check with array vendor for optimal queue depth value. (Enable this via option “F” aka QSTATS) 큐가 꽉 찼다. 아마도 큐의 크기를 낮게 설정 또는 컨트롤러 과부하가 원인이다. 최적화된 큐 크기를 스토리지 배열 벤더로부터 확인하라.(옵션 F를 enable 시켜라-QSTATS)
DISK	ABRTS/s	1	Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason. 저장소가 응답하지 않기 때문에 게스트(VM)에 의해 발급된 중단. 윈도우 VM의 경우 이는 기본적으로 60초 후에 발생한다. 어떤 이유로든 경로가 실패하거나 스토리지 배열이 입출력을 허용하지 않을 때 발생할 수 있다.
DISK	RESETS/s	1	The number of commands resets per second. 초당 재설정되는 명령의 수
DISK	ATSF	1	The number of failed ATS commands, this value should be 0 실패한 ATS 명령의 수, 이 값은 0이어야 함 * ATS: VMFS에서 사용하는 하드웨어 지원 잠금 알고리즘(하드웨어 가속을 지원하는 스토리지 디바이스에서 사용, SCSI 예약과 달리 디스크 섹터별 개별 잠금을 지원함)
DISK	ATS	1	The number of successful ATS commands, this value should go up over time when the array supports ATS 성공한 ATS 명령의 수. 이 값은 스토리지 배열이 ATS를 지원할 경우 시간이 지남에 따라 올라간다.
DISK	DELETE	1	The number of successful UNMAP commands, this value should go up over time when the array supports UNMAP! 성공적인 UNMAP 명령의 수. 스토리지 배열이 UNMAP을 지원할 경우 시간이 지남에 따라 증가한다. * UNMAP: 게스트 운영 체제에서 VMFS5,6 데이터스토어로 보내는 매핑 해제 요청(가상 시스템의 공간 회수 시 사용됨)
DISK	DELETE_F	1	The number of failed UNMAP commands, this value should be 0 실패한 UNMAP 명령의 수, 이 값은 0이어야 함
DISK	CONS/s	20	SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS. 초당 SCSI Reservation 충돌. 많은 SCSI Reservation 충돌이 발생하면 VMFS의 잠금으로 인해 성능이 저하될 수 있다.
VSAN	SDLAT	5	Standard deviation of latency, when above 10ms latency contact support to analyze vSAN Observer details to find out what is causing the delay 대기 시간의 표준 편차, 대기 시간이 10ms 이상인 경우 지연을 유발하는 원인을 파악하기 위해 vSAN 옵저버 세부 정보를 분석합니다.

- CMDS/s: 초당 총 명령 수를 의미(IOPS 및 SCSI 명령 등이 포함)

- DAVG/cmd: HBA와 디스크 사이의 지연 시간

- KAVG/cmd: VMkernel에 의해 생성된 대기 시간

- GAVG/cmd: 게스트 운영 체제에서 인식하는 응답 시간

* DAVG/cmd, KAVG/cmd, GAVG/cmd 값에 대한 기준은 0~10ms가 최상의 상태이며 20ms가 넘어갈 경우 성능저하가 발생할 수 있습니다.

참조

KB: https://kb.vmware.com/s/article/1008205

Community: https://communities.vmware.com/thread/203910

밍쓰의 IT 공부 기록 블로그

[Broadcom]VMware ESXTOP(1)-설명

+ Recent posts

티스토리툴바