September 3, 2015

OMD (Check_MK) Alert Notification Integration with PagerDuty Done Right

enter image description here
Yes, shit just hit the fan. What are you gonna do?

If you are thinking about integrating OMD or Check_MK alert notification with PagerDuty.com, you are in the right place. The official documentation from PagerDuty is not done right by the Flexible Notification feature provided with OMD (Open Monitoring Distribution) or Check_MK.

If you don’t know what Flexible Notification is or what OMD is about, I recommend you to check out my other blog post - The Best Open Source Monitoring Solution 2015.

Create Notification Service on PagerDuty

Step 1. Log in to PagerDuty as an admin user. Click Service under the Configuration menu option.
enter image description here

Step 2. Click the Add New Service button.
enter image description here

Step 3. Fill out the form as the arrows indicated in the following image.
enter image description here

Step 4. Congratulations, we are done with the PagerDuty part. Grab the Service API Key for you are going to need it later.
enter image description here

Install PagerDuty Notification Script

Step 1. SSH into the OMD server

Step 2. Install Perl dependencies

FOR RHEL, Fedora, CentOS, and other Redhat-based distributions:

yum install perl-libwww-perl perl-Crypt-SSLeay perl-Sys-Syslog

For Debian, Ubuntu, and other Debian-based distributions:

apt-get install libwww-perl libcrypt-ssleay-perl libsys-syslog-perl

Step 3. Download pagerduty_nagios.pl from github, copy it to /usr/local/bin and make it executable:

wget https://raw.github.com/PagerDuty/pagerduty-nagios-pl/master/pagerduty_nagios.pl 
cp pagerduty_nagios.pl /usr/local/bin
chmod +x /usr/local/bin/pagerduty_nagios.pl

Step 4. Create cron job to flush notification queue
First become the OMD/Check_MK site user in shell, then create a cron.d file under /omd/sites/<<Site Name>>/etc/cron.d/pagerduty with the following content:

#
# Flush PagerDuty notification queue
#

* * * * * /usr/local/bin/pagerduty_nagios.pl flush

Now enable the cron job as the OMD site user:

omd reload crontab

Since the cron job runs every minute, you can change back to the root user and check to see if the cron job has been triggered as expected.

[root@omd.server.com ~]# tail -f /var/log/cron
Sep  4 08:25:01 omd.server.com CROND[24090]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)
Sep  4 08:26:01 omd.server.com CROND[27175]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)
Sep  4 08:27:01 omd.server.com CROND[30195]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)

Integrate with OMD Flexible Notification

Assume you are still connected to OMD server with SSH.

Step 1. Adding custom Flexible Notification script
copy and save the following script to location /omd/sites/{YOUR-SITE}/local/share/check_mk/notifications/pagerduty.sh

#!/bin/bash
# PagerDuty

PAGERDUTY="/usr/local/bin/pagerduty_nagios.pl"

# For Service notification
if [ "$NOTIFY_WHAT" = "SERVICE" ]; then
    echo "$PAGERDUTY enqueue -f pd_nagios_object=service -f CONTACTPAGER=\"$NOTIFY_PARAMETER_1\" -f NOTIFICATIONTYPE=\"$NOTIFY_NOTIFICATIONTYPE\" -f HOSTNAME=\"$NOTIFY_HOSTNAME\" -f SERVICEDESC=\"$NOTIFY_SERVICEDESC\" -f SERVICESTATE=\"$NOTIFY_SERVICESTATE\""
    $PAGERDUTY enqueue -f pd_nagios_object=service -f CONTACTPAGER="$NOTIFY_PARAMETER_1" -f NOTIFICATIONTYPE="$NOTIFY_NOTIFICATIONTYPE" -f HOSTNAME="$NOTIFY_HOSTNAME" -f SERVICEDESC="$NOTIFY_SERVICEDESC" -f SERVICESTATE="$NOTIFY_SERVICESTATE"

# For Host notification
else
    $PAGERDUTY enqueue -f pd_nagios_object=host -f CONTACTPAGER="$NOTIFY_PARAMETER_1" -f NOTIFICATIONTYPE="$NOTIFY_NOTIFICATIONTYPE" -f HOSTNAME="$NOTIFY_HOSTNAME" -f HOSTSTATE="$NOTIFY_HOSTSTATE"
fi

Step 2. Make it executable

chmod +x pagerduty.sh

Step 3. Log in to OMD or Check_MK web interface and configure Flexible Notification to use the NEW PagerDuty notification script.

Assuming you already know how to operate Flexible Notification. Select PagerDuty as the Notification Plugin and put the API key you acquired earlier when setting up PagerDuty service into the Plugin Arguments field as shown in the image.
enter image description here

Since pagerduty_nagios.pl was designed to work with Nagios, it doesn’t take flapping notifications. Make sure you uncheck those boxes.
enter image description here
You can create multiple PagerDuty services and pair them up with OMD/Check_MK Flexible Notification. Happy ending.

Testing and Troubleshooting

Test pagerduty_nagios.pl

To send a test notification directly with the pagerduty_nagios.pl, use the following example and swap out <<API Key>> and <<HOST Name>> with your own value.

Make sure you become the OMD site user first because once you run the command once, it’s going to create a directly in /tmp/pagerduty_nagios. If you run the command as root now, you will have permission issue later when OMD is trying to send notification to PagerDuty with a different user.

/usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object=service -f CONTACTPAGER="<<API Key>>" -f NOTIFICATIONTYPE="PROBLEM" -f HOSTNAME="<<HOST Name>>" -f SERVICEDESC="this is just a test" -f SERVICESTATE="CRIT"

You will be able to find output in syslog, and depend on the OS variation you use, location may vary.

If you get the following error message like I did:

perl: symbol lookup error: /omd/sites/<<site Name>>/lib/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Encode/Encode.so: undefined symbol: Perl_Istack_sp_ptr

Add the following lines to the beginning of the PagerDuty Perl Script located /usr/local/bin/pagerduty_nagios.pl

use lib '/usr/lib64/perl5/';
no lib '/omd/sites/monitor/lib/perl5/lib/perl5/x86_64-linux-thread-multi';

Test with Flexible Notification

Step 1.
You first need to enable debugging for notification from the web UI. Enable setting in Global Settings -> Notifications -> Debug notifications. The resulting log file is in the directory notify below Check_MK’s var directory. OMD users find the file in ~/var/check_mk/notify/notify.log. Remember switch it back after you are done debugging.

Step 2.
Now pick a Host for Service that you’ve configure it’s notification to use the PagerDuty plugin, Click the Hammer icon on the top and click on Critical button in the Various Commands section.
enter image description here

Step 3.
Now log in to the PagerDuty account and select Dashboard form the top menu. In a minute or two, you should see some thing as shown in the image. If not, you need to go back to the log files and figure out why.
enter image description here

If you do see your fake incident appear on the PagerDuty dashboard, CONGRATULATIONS .

Share with us, comment on what notification mechanism do you use? Do you build it in-house or use a popular 3rd party service?