Monitoring your complete IT infrastructure in a comprehensive way usually means monitoring not only your physical infrastructure, but also your virtual environments. With the layer of virtualization added to the layers that represent your physical hardware, it also becomes necessary to plan beforehand how you want to logically set up your monitoring infrastructure.
Monitoring All Layers of Your IT Infrastructure
In general, you can assume that with the layer of virtualization, you need to monitor a total of four layers in your IT infrastructure.
Layer
Description
Hardware (Server Racks)
Usually, you monitor most of the hardware components in your network with SNMP sensors. With this monitoring technology, you can gather monitoring data such as CPU load, memory, and disk space. You can also get information about the network traffic and bandwidth usage of your routers and switches.
Alerts can tell you if there is an issue with a hardware component or if hardware resources are running out. In addition, you can identify potential bottlenecks that might affect your virtualized infrastructure.
Host Server Hardware
We recommend that you explicitly monitor the host hardware of your virtualization solution. If you have issues with your virtual machines (VM), the origin might be a host hardware failure. You should closely monitor your VM host servers to be alerted if the hardware status changes in any significant way.
Besides the standard hardware sensors, PRTG provides specific sensors for various virtualization host servers. The following monitoring data of your host servers can prevent issues in virtualized environments:
VMware: current reading and health status (via Web-based Enterprise Management (WBEM)), a general status as shown in vSphere (via Simple Object Access Protocol (SOAP)), and disk space of a VMware data store (via SOAP)
Hyper-V: host health-critical values; deposited pages; network traffic; CPU usage of guests, hypervisor, and in total
Citrix XenServer: CPU, memory, and network usage; the number of running VMs on the host server; and load average
Resource Usage of VMs
VMs run on their particular host servers. PRTG can show you the status of single VMs and several of their performance counters. You might want to know which resources a single VM uses and needs, but we do not recommend that you monitor single VMs in every case because it has a noticeable influence on overall performance. Often, it is sufficient to only monitor VMs that are critical for your network. If a VM reaches its capacity limits, PRTG can alert you so that you can take the respective steps to solve the issue.
Indicators for a healthy VM that you can monitor with PRTG are:
VMware: CPU and memory usage, disk read and write speed, read and write latency, and network usage
Hyper-V: CPU usage, disk read and write speed
Citrix XenServer: CPU usage and free memory
Operating Systems of VMs
You can monitor, for example, the Windows operating system of a single VM with the standard WMI sensors. With this technology, you can access data of various Windows parameters. Other operating systems like Linux/macOS can make data available via SSH and SNMP.
The status of the operating systems on your VMs can indicate potential issues. You can monitor these but be careful with regard to performance considerations. This is because sensors using the WMI protocol have a high impact on system performance, so you should only monitor operating systems that are critical for your infrastructure. Furthermore, you do not need to monitor every item multiple times. For example, it might be sufficient to monitor free disk space only as a needed resource of the actual VM, not for the VM's operating system itself.
Monitoring the Virtual Infrastructure
To monitor your IT infrastructure, best practice is to first set up the monitoring of your data center's hardware layer in PRTG. This way, you can detect potential bottlenecks that might have an impact on your virtual servers. Then, you can prepare to start monitoring your virtual environment. If you use several solutions for virtual hosting, it is also a good idea to group related host servers, their VMs, and the operating systems. The screenshot below shows a possible structure of monitoring a virtual environment with PRTG.
Grouped Virtual Components
At the top level, you can see the Virtual Hosting group. This group contains several subgroups for the virtualization solutions Citrix XenServer, Microsoft Hyper-V, and VMware vSphere. The vSphere group, for example, has three subgroups: we monitor the vCenter VMs and the vCenter Windows operating system (vCenter group), the performance of the host server (Host Performance group), and the storage system of the host (Host Storage group).
Devices for Physical Hosts
In PRTG, set up devices that represent the physical hosts of your VMs. For example, for your VMware hosts, add devices that represent the ESXi servers. For Hyper-V, add devices that represent your Hyper-V host servers. For Citrix, add devices that represent your XenServers.
Then you can add suitable sensors to the host server devices. If you run the auto-discovery, many sensors are automatically created. Several preconfigured host hardware sensors are available:
These sensors monitor hardware-specific counters to ensure that no hardware issues affect your actual VMs. Additional sensors can monitor the host hardware via the Simple Network Management Protocol (SNMP) (for example, traffic and custom requests), and the data storage on ESXi servers via SOAP. There are also sensors for network adapters and storage devices that are connected to a Hyper-V host server.
Devices for Virtual Machines
To monitor your actual VMs, add them to your host servers in PRTG. For a better overview, you might want to add a device to PRTG that represents your host server and add sensors for your VM there. The respective sensors for VMs show you the performance of single VMs as well as their resource usage. This identifies VMs with low performance so that you can react before there is an issue with one or more of your VMs. As mentioned before, you can additionally monitor the operating systems of your VMs, if necessary. See the following sections for details about particular virtualization solutions.
VMware Virtual Machine
The VMware Virtual Machine (SOAP) sensor monitors VMs on a VMware host server via SOAP. The general idea is to add a vCenter server as a device to your vCenter group and use it as a parent device to which you add the sensors for your VMs. This way, in the case of vMotion, when your VMs change their host server, PRTG can follow these movements and does not lose the monitored VMs.
For this sensor, .NET 4.7.2 or later must be installed on the probe system. If you use many VMware sensors, we also recommend that you adjust the settings on your VMware host server to accept more incoming connections.
vSphere Group
This screenshot shows an example of a vSphere group. As recommended, the sensors for the VMware virtual machines are added to the vCenter 1 device. There is also a dedicated vCenter 2 device for the vCenter Windows operating system with common WMI sensors for CPU, memory, disk, and network usage. The ESXi host servers are organized in their own groups regarding performance and storage. In this example, PRTG monitors the hosts with the standard SNMP hardware sensors as well as with the specific VMware ESXi host sensors.
Microsoft Hyper-V Virtual Machine
The Hyper-V Virtual Machine sensor monitors VMs via Windows Management Instrumentation (WMI) or Windows performance counters, as configured in the Windows Compatibility Options of the parent device. With this hybrid approach, the sensor first tries to query data via performance counters and uses WMI as a fallback if no performance counters are available. Performance counters generally need less system resources than WMI. The parent device of this sensor must be a Windows server running Hyper-V. You should also disable the User Account Control (UAC) in the Windows operating system of the VM. Otherwise, the sensor might change to the Down status with the error message The virtual machine is not running or is powered off. Also, this sensor does not support Live Migration.
Hyper-V Group
This screenshot shows an example of a Hyper-V group. There is a dedicated group for failover clusters where two cluster nodes are monitored with several SNMP and WMI sensors, as well as Hyper-V Host Server sensors and sensors for the Hyper-V virtual machines. This ensures that Hyper-V and failover clusters work without any issues. The Hyper-V hosts are monitored the same way, organized in a dedicated group for hosts.
Citrix XenServer Virtual Machine
The Citrix XenServer Virtual Machine sensor monitors VMs via HTTP. For this sensor, you must add a device that represents a Citrix XenServer running at least version 5.0. Another requirement is the Microsoft .NET Framework. You must install .NET 4.7.2 or later on the probe system.
In a XenServer pool, each host knows each running VM. Because of this, there is no central instance that provides all available data, so it does not matter on which host you query your VMs. All queries on any host are automatically forwarded to the pool master that manages the XenServer pool. Therefore, it is sufficient to create the desired sensors for your XenServer VMs on a device that represents one host server of your pool. The XenServer sensors find out which host is running and retrieve the respective data.
XenServer Group
This screenshot shows an example of a XenServer group. There are two devices for XenServer hosts, Xen 1 and Xen 2, that each have a Citrix XenServer Host sensor and several Citrix XenServer Virtual Machine for the particular VMs on this host. Furthermore, the Windows operating system is represented as a dedicated virtualcontrol device that PRTG monitors with several WMI sensors regarding CPU, disk, memory, and network usage.
Performance Considerations
For best performance when monitoring virtual environments, we strongly recommend that you use a computer with at least Windows Server 2012 R2 installed on the probe system. For example, you can run up to 300 VMware sensors with a 60-second scanning interval as of Windows Server 2012 R2, while you can only use 30 VMware sensors with the same scanning interval on Windows Server 2008 R2.