Monday, November 7, 2011

Monitoring VMware ESXi and vSphere with Nagios

Requirements
  • Perl 5.8
  • Several supporting Perl modules:
    • Crypt-SSLeay (0.51) [Crypt::SSLeay]
    • Data-Dumper (2.102) [Data::Dumper]
    • MethodMaker (2.0.8) [Class::MethodMaker]
    • XML-LibXML (1.60) [XML::LibXML]
    • libwww-perl (5.805) [LWP]
This article describes how to monitor a VMWare ESXi or vSphere host with Nagios, using the OP5 Check ESX Plugin written in PERL. The plugin can monitor either a single ESXi/vSphere server or a VirtualCenter/vCenter Server and individual virtual machines. We’ll see here how to monitor an ESXi 4 host.
The following tutorial has been made on a CentOS server, you may have to adapt some paths with other distributions.

Install all the dependencies of Perl SDK.

perl -MCPAN -e shell

The above command shall bring you to the perl cpan install CLI. This will allow you to install requirements for Perl.

Example:

cpan> install
Crypt::SSLeay

This article describes how to monitor a VMWare ESXi or vSphere host with Nagios, using the OP5 Check ESX Plugin written in PERL. The plugin can monitor either a single ESXi/vSphere server or a VirtualCenter/vCenter Server and individual virtual machines. We’ll see here how to monitor an ESXi 4 host.

The following tutorial has been made on a CentOS server, you may have to adapt some paths with other distributions.

Installation

The prerequisite for the plugin to work is to install the VMWare Perl SDK available on the manufacturer website.
Download the file on your server, for example in the root directory, untar it and run the installer that way :

# cd /root # tar xvzf VMware-vSphere-Perl-SDK-4.1.0-254719.i386.tar.gz  # cd vmware-vsphere-cli-distrib/ # ./vmware-install.pl


"Creating a new vSphere CLI installer database using the tar4 format.

Installing vSphere CLI.

You must read and accept the vSphere CLI End User License Agreement to continue.
Press enter to display it."

"Read through the License Agreement"

"Do you accept? (yes/no)"

yes


"In which directory do you want to install the executable files?
[/usr/bin]"


"The following Perl modules were found on the system but may be too old to work
with vSphere CLI:

Crypt::SSLeay
Compress::Zlib

The installation of vSphere CLI 4.0.0 build-161974 for Linux
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command:
"/usr/bin/vmware-uninstall-vSphere-CLI.pl".

Enjoy,

--the VMware team"

If ever you're having an issue installing the SDK and complaining about http_proxy, issue the following command:

Run commands before vmware-install.pl:

export http_proxy=

export ftp_proxy=

Download nagios check plugin check_esx3.pl from op5.com

http://www.op5.org/community/plugin-inventory/op5-projects/op5-plugins

Follow the instructions given by the script. Depending on your setup, some PERL dependencies must be installed prior for the SDK to work correctly. When it’s done, we can get the plugin here, and copy it to /usr/lib/nagios/plugins/. Make it executable :

# cd /usr/lib/nagios/plugins/ # chmod a+x check_esx

Take NOTE: change check_esx to check_esx3.pl

Configuration

Now, we can start the real configuration for Nagios. We’ll need a username and password to access the ESXi host, let’s define those Nagios variables in a safe place in /etc/nagios/resource.cfg, so that this information will be hidden from the CGIs :

$USER11$=username $USER12$=password

In this tutorial, we’ll be monitoring these resources : CPU, memory usage, net usage, runtime status and IO/read/write. But some more are available, see the references here. Below are the new commands related to ESXi to add in the /etc/nagios/objects/command.cfg file (these are the ESXi related commands only, NOT the full command.cfg, you may append this at the end of the file) :

# check vmware esxi machine # check cpu define command{         command_name check_esx_cpu         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l cpu -s usage -w $ARG1$ -c $ARG2$         }   # check memory usage define command{         command_name check_esx_mem         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l mem -s usage -w $ARG1$ -c $ARG2$         }   # check net usage define command{         command_name check_esx_net         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l net -s usage -w $ARG1$ -c $ARG2$         }   # check runtime status define command{         command_name check_esx_runtime         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l runtime -s status         }   # check io read define command{         command_name check_esx_ioread         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s read -w $ARG1$ -c $ARG2$         }   # check io write define command{         command_name check_esx_iowrite         command_line $USER1$/check_esx -H $HOSTADDRESS$ -u $USER11$ -p $USER12$ -l io -s write -w $ARG1$ -c $ARG2$         }

And an example of the configuration for a Nagios host called esxi01 in /etc/nagios/hosts/esxi01.cfg :

# Host esx01 define host{         use                     linux-server         host_name               esxi01         alias                   VMWare ESXi 01         address                 192.168.1.100         }   # Define a service to "ping" the local machine define service{         use                             generic-service         host_name                       esxi01         service_description             PING         check_command                   check_ping!100.0,20%!500.0,60%         }   # VMWare # check cpu define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi CPU Load         check_command                   check_esx_cpu!80!90         }   # check memory usage define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi Memory usage         check_command                   check_esx_mem!80!90         }   # check net define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi Network usage         check_command                   check_esx_net!102400!204800         }   # check runtime status define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi Runtime status         check_command                   check_esx_runtime         }   # check io read define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi IO read         check_command                   check_esx_ioread!40!90         }   # check io write define service{         use                             generic-service         host_name                       esxi01         service_description             ESXi IO write         check_command                   check_esx_iowrite!40!90         }
It’s done. Restart Nagios and wait a while (or re-schedule) for the new resources to be monitored.