January 6, 2016

DevOps: How do I Get Here?

enter image description here

Today’s loop of code from build to testing to production has evolved into a never ending cycle. Instead of pushing code to production once every quarter, companies now expect code to be tested and deployed every few minutes with little to no failure. How is this possible? What has changed?

The competitive landscape, coupled with user expectations, have made the waterfall development model obsolete. Companies can’t wait three months to push new features to production, because their competitors won’t. Companies are forced to try new things and iterate fast to meet user expectations and differentiate themselves from their competitors. The continuous loop of code requires an industry-wide change in culture. Roles of each player in the organization have shifted dramatically.

Given the high frequency of infrastructure and code changes, operations are now on the front lines, partnered with developers out of necessity. Developers are also increasingly responsible for the reliability of their code, and require instrument health checks on an on-going basis. Therefore to ensure the marriage between developers and operations does not end in a crash and burn state, the culture of DevOps is born.

We have seen that companies who are truly able to achieve a DevOps culture and outrun competition, incorporate the following:

  1. Modular architectures
    • They split singular apps into microservices to avoid a single point of failure and to accelerate root cause analysis when monitoring changes and fixing bugs. It’s much quicker and less risky to the business to take a single microservice offline for fixing than to take the entire app down.
  2. Agile development pipelines
    • Containers, led by Docker, simplify management of microservices at scale. Unlike traditional virtual machines, containers share a common OS kernel which makes it easy to deploy in larger quantities. Companies such as Google, Twitter, and LinkedIn, may provision hundreds of thousands of containers per day and each one may only live for seconds. This type of agility allows for a scalable development pipeline.
  3. Continuous integration and delivery
    • Microservices and containers are most effective when code is continuously integrated as changes are committed, tests are automated, and builds are compiled and released continuously. With services such as Jenkins and Chef, companies are able to make the continuous loop of code from build to testing to production run smoothly and effectively.
  4. Service-aware monitoring
    • To avoid downtime and user-impacting issues, DevOps must be able to efficiently monitor changes. While monitoring tools, such as Nagios, can be a useful blinking light to tell you when something going wrong, it can also be very noisy and allow critical issues to slip through the cracks. Therefore using a monitoring stack which can help DevOps easily map infrastructure to services and services to impact is critical to the health of the business.Therefore, platforms such as BigPanda, which correlate massive volumes of alerts from Nagios, allow DevOps to reduce noise and accelerate root cause analysis .

In an age where speed, agility and systems health directly impact business competitiveness, service reputation, and ultimately the bottom line, it is imperative to achieve a DevOps culture. Today, the DevOps marriage is necessary to stay competitive and keep the business above water.


September 3, 2015

OMD (Check_MK) Alert Notification Integration with PagerDuty Done Right

enter image description here
Yes, shit just hit the fan. What are you gonna do?

If you are thinking about integrating OMD or Check_MK alert notification with PagerDuty.com, you are in the right place. The official documentation from PagerDuty is not done right by the Flexible Notification feature provided with OMD (Open Monitoring Distribution) or Check_MK.

If you don’t know what Flexible Notification is or what OMD is about, I recommend you to check out my other blog post - The Best Open Source Monitoring Solution 2015.

Create Notification Service on PagerDuty

Step 1. Log in to PagerDuty as an admin user. Click Service under the Configuration menu option.
enter image description here

Step 2. Click the Add New Service button.
enter image description here

Step 3. Fill out the form as the arrows indicated in the following image.
enter image description here

Step 4. Congratulations, we are done with the PagerDuty part. Grab the Service API Key for you are going to need it later.
enter image description here

Install PagerDuty Notification Script

Step 1. SSH into the OMD server

Step 2. Install Perl dependencies

FOR RHEL, Fedora, CentOS, and other Redhat-based distributions:

yum install perl-libwww-perl perl-Crypt-SSLeay perl-Sys-Syslog

For Debian, Ubuntu, and other Debian-based distributions:

apt-get install libwww-perl libcrypt-ssleay-perl libsys-syslog-perl

Step 3. Download pagerduty_nagios.pl from github, copy it to /usr/local/bin and make it executable:

wget https://raw.github.com/PagerDuty/pagerduty-nagios-pl/master/pagerduty_nagios.pl 
cp pagerduty_nagios.pl /usr/local/bin
chmod +x /usr/local/bin/pagerduty_nagios.pl

Step 4. Create cron job to flush notification queue
First become the OMD/Check_MK site user in shell, then create a cron.d file under /omd/sites/<<Site Name>>/etc/cron.d/pagerduty with the following content:

# Flush PagerDuty notification queue

* * * * * /usr/local/bin/pagerduty_nagios.pl flush

Now enable the cron job as the OMD site user:

omd reload crontab

Since the cron job runs every minute, you can change back to the root user and check to see if the cron job has been triggered as expected.

[root@omd.server.com ~]# tail -f /var/log/cron
Sep  4 08:25:01 omd.server.com CROND[24090]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)
Sep  4 08:26:01 omd.server.com CROND[27175]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)
Sep  4 08:27:01 omd.server.com CROND[30195]: (omd_user) CMD (/usr/local/bin/pagerduty_nagios.pl flush)

Integrate with OMD Flexible Notification

Assume you are still connected to OMD server with SSH.

Step 1. Adding custom Flexible Notification script
copy and save the following script to location /omd/sites/{YOUR-SITE}/local/share/check_mk/notifications/pagerduty.sh

# PagerDuty


# For Service notification
if [ "$NOTIFY_WHAT" = "SERVICE" ]; then

# For Host notification

Step 2. Make it executable

chmod +x pagerduty.sh

Step 3. Log in to OMD or Check_MK web interface and configure Flexible Notification to use the NEW PagerDuty notification script.

Assuming you already know how to operate Flexible Notification. Select PagerDuty as the Notification Plugin and put the API key you acquired earlier when setting up PagerDuty service into the Plugin Arguments field as shown in the image.
enter image description here

Since pagerduty_nagios.pl was designed to work with Nagios, it doesn’t take flapping notifications. Make sure you uncheck those boxes.
enter image description here
You can create multiple PagerDuty services and pair them up with OMD/Check_MK Flexible Notification. Happy ending.

Testing and Troubleshooting

Test pagerduty_nagios.pl

To send a test notification directly with the pagerduty_nagios.pl, use the following example and swap out <<API Key>> and <<HOST Name>> with your own value.

Make sure you become the OMD site user first because once you run the command once, it’s going to create a directly in /tmp/pagerduty_nagios. If you run the command as root now, you will have permission issue later when OMD is trying to send notification to PagerDuty with a different user.

/usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object=service -f CONTACTPAGER="<<API Key>>" -f NOTIFICATIONTYPE="PROBLEM" -f HOSTNAME="<<HOST Name>>" -f SERVICEDESC="this is just a test" -f SERVICESTATE="CRIT"

You will be able to find output in syslog, and depend on the OS variation you use, location may vary.

If you get the following error message like I did:

perl: symbol lookup error: /omd/sites/<<site Name>>/lib/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Encode/Encode.so: undefined symbol: Perl_Istack_sp_ptr

Add the following lines to the beginning of the PagerDuty Perl Script located /usr/local/bin/pagerduty_nagios.pl

use lib '/usr/lib64/perl5/';
no lib '/omd/sites/monitor/lib/perl5/lib/perl5/x86_64-linux-thread-multi';

Test with Flexible Notification

Step 1.
You first need to enable debugging for notification from the web UI. Enable setting in Global Settings -> Notifications -> Debug notifications. The resulting log file is in the directory notify below Check_MK’s var directory. OMD users find the file in ~/var/check_mk/notify/notify.log. Remember switch it back after you are done debugging.

Step 2.
Now pick a Host for Service that you’ve configure it’s notification to use the PagerDuty plugin, Click the Hammer icon on the top and click on Critical button in the Various Commands section.
enter image description here

Step 3.
Now log in to the PagerDuty account and select Dashboard form the top menu. In a minute or two, you should see some thing as shown in the image. If not, you need to go back to the log files and figure out why.
enter image description here

If you do see your fake incident appear on the PagerDuty dashboard, CONGRATULATIONS .

Share with us, comment on what notification mechanism do you use? Do you build it in-house or use a popular 3rd party service?


August 9, 2015


enter image description here



P.S. 大陸人天生看得懂繁體字,至少我身邊的人如此,不要問我為什麼~


  • 簡體的翻譯書籍比繁體多
  • 比較便宜 - 希歡看書的人可以省不少荷包



enter image description here






  • CSDN - 中文IT社區
  • InfoQ - 先進網路技術大會內容紀錄
  • 36氪 - 科技媒體
  • 優設 - 網頁設計師學習平台
  • 設計達人 - 收集網路設計精華及翻譯
  • 騰訊設計團隊 - 腾讯社交用户体验设计,简称ISUX (Internet Social User Experience)
  • CDC - 腾讯用户研究与体验设计中心


MOOC (Massive Open Online Course),就是免費的線上教育啦~ 對啦,免費啦!

MOOC 是由美國知名的大學聯合或是獨立運作的免費線上服務,其中不泛哈佛(Harvard)、麻省理工學院(MIT)、普林斯頓大學(Princeton)的課程。中國也出了中文的 MOOC 的網站,不但把上述這些美國知名大學的課程配上簡體中文字幕,更有中國、台灣(台大、交大、清華)一流學府的教學內容。在資訊爆炸的時代,不要再抱怨起跑點不同了,至少你有網路可以用。







  • 優酷 - 像是大陸版的
  • 樂視網 - 各國動漫、電影、電視劇,都是有中文字幕的






只要把你的搜尋引擎從 Google 或 Yahoo 換成 百度 (http://www.baidu.com/) ,你所搜尋的結果就都會是簡體中文的了。相信我,有興趣的主題會加速你的學習,在不知不覺中你就學會看簡體字了。





July 17, 2015

Scale Selenium Grid in 5 Seconds with Zero Docker Experience

usability testing

The reason to use CoreOS as a Docker server is because CoreOS is an extremely light weight, stripped down Linux distribution containing none of the extras that are associated with Ubuntu and the like. And it was designed to run Docker and Docker clusters, so by using it we are buying the future.

Many cloud service provider already has CoreOS image to start with. If you are not going to be running the CoreOS server on bare metal or if you have other Docker server already installed, you can skip this step.

Install CoreOS on Bare Metal (Hardware)

Download the stable CoreOS ISO from here: Download Link

Then you can burn the ISO into a CD/DVD or a bootable USB disk and use it as the boot source to boot up your physical server.

Once the command line becomes available, create a cloud-config.yml file with the ssh-key you will be using to connect to the server from remote later. The content of the file should look like this:


  - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0g+ZTxC7weoIJLUafOgrm+h...

CoreOS allows you to declaratively customize various OS-level items, such as network configuration, user accounts, and systemd units. This document describes the full list of items we can configure. The coreos-cloudinit program uses these files as it configures the OS after startup or during runtime.

Unlike AWS, cloud-config file is run during EACH system boot. While it is inspired by the cloud-init project, there are tools isn’t used by CoreOS. Only the relevant subset of its configuration items is implemented in CoreOS cloud-config file. Please refer to the official CoreOS document for detial on the subject.
cloud-config Documentation Link

Now run the install command with the cloud-config file name as argument:

coreos-install -d /dev/sda -c cloud-config.yml

Once you complete the above steps, reboot the server and make sure you remove your boot disk or boot CD/DVD. The server will boot up in command line mode and display it’s IP address obtained from DHCP server. From this point on, you can only ssh into the server via the ssh-key you provided earlier.

Install and Setup Docker Compose

Compose is a tool for defining and running multi-container applications with Docker. With Compose, you define a multi-container application in a single file, then spin your application up in a single command which does everything that needs to be done to get it running.

Docker Compose is the key component here to make spinning up or tearing down an entire Selenium Grid Farm (1 hub + multiple browsers) in a single command. Yes, you read it right, just ONE command. Installing Compose is a bit tricky though because of how CoreOS is built.

Here is the video that showed me how to install docker-compose on CoreOS toward the end of the video. You can skip the video because it’s 46 minutes long. And I will show you the exact steps on how you can do the same next.

Install Docker Compose

docker-compose is just a precompiled binary file we can download from its Github Page. Check out the link and choose the latest version number to replace in the following code (run as the default core user):

mkdir ~/bin
curl -L https://github.com/docker/compose/releases/download/1.3.3/docker-compose-`uname -s`-`uname -m` > ~/bin/docker-compose
chmod +x ~/bin/docker-compose
echo "export PATH="$PATH:$HOME/bin"" >> ~/.bashrc
source ~/.bashrc

Again if you are not using CoreOS, please follow the official installation instruction and skip the above code block.

Now the docker-compose command is ready for use, simply run docker-compose in shell, you should see the following:

core@localhost ~ $ docker-compose
Define and run multi-container applications with Docker.

  docker-compose [options] [COMMAND] [ARGS...]
  docker-compose -h|--help

  -f, --file FILE           Specify an alternate compose file (default: docker-compose.yml)
  -p, --project-name NAME   Specify an alternate project name (default: directory name)
  --verbose                 Show more output
  -v, --version             Print version and exit

  build              Build or rebuild services
  help               Get help on a command

Configure Docker Compose Application File

We’ll be using Docker images build by the official Selenium repository on Docker Hub, so you don’t need to build your own.

Under /home/core/ , create a file named docker-compose.yml with the following content:

  image: selenium/hub
    - "4444:4444"
  image: selenium/node-firefox
    - hub
  image: selenium/node-chrome
    - hub

You can use other browser images in the Selenium repository in the docker-compose.yml file, and they should just work.

Manage Selenium Grid with Docker Compose

If you are running this for the first time, expect Docker to download all the necessary images in the beginning. Once all the images are downloaded into local repository, it will just take seconds to start the Selenium Grid in the future.

Start Selenium Grid

As user core, run:

docker-compose -f ~/docker-compose.yml up -d

By appending the -d argument at the end of the command, you are telling Compose to run the application in the background (daemon).

Sample output:

core@localhost /usr $ docker-compose -f ~/docker-compose.yml up -d
Recreating core_hub_1...
Recreating core_firefox_1...
Recreating core_chrome_1...

And you can verify it by running docker ps:

core@localhost /usr $ docker ps
CONTAINER ID        IMAGE                          COMMAND                CREATED              STATUS              PORTS                                                                         NAMES
a09d3ab302fa        selenium/node-chrome:latest    "/opt/bin/entry_poin   About a minute ago   Up About a minute                                                                                 core_chrome_1
298a612a387a        selenium/node-firefox:latest   "/opt/bin/entry_poin   About a minute ago   Up About a minute                                                                                 core_firefox_1
62136ab8fdb0        selenium/hub:latest            "/opt/bin/entry_poin   About a minute ago   Up About a minute>4444/tcp                                                        core_hub_1

To see the real action in browser, open url http://your-coreos-IP:4444/grid/console

enter image description here

Scale Selenium Grid

Let’s say you need 5 Firefox browser and 5 Chrome browser in your Grid. Simply run:

docker-compose scale firefox=5 chrome=5

Sample output:

core@localhost ~ $ docker-compose scale firefox=5 chrome=5
Creating core_firefox_2...
Creating core_firefox_3...
Creating core_firefox_4...
Creating core_firefox_5...
Starting core_firefox_2...
Starting core_firefox_3...
Starting core_firefox_4...
Starting core_firefox_5...
Creating core_chrome_2...
Creating core_chrome_3...
Creating core_chrome_4...
Creating core_chrome_5...
Starting core_chrome_2...
Starting core_chrome_3...
Starting core_chrome_4...
Starting core_chrome_5...

And now your Selenium Hub management web page looks like this:
enter image description here

selenium grid with multiple browsers

Tearing Down Selenium Grid

Demolishing the entire selenium grid is just as easy as starting them. Since it only takes seconds to start, have a clean slate for all the Selenium tests is not a dream any more. You want all the browser to start without previous cookies or settings, and all is possible now with Docker containers.

docker-compose -f ~/docker-compose.yml stop && docker-compose -f ~/docker-compose.yml rm -f

Sample output:

core@localhost ~ $ docker-compose -f ~/docker-compose.yml stop && docker-compose -f ~/docker-compose.yml rm -f
Stopping core_chrome_1...
Stopping core_firefox_1...
Stopping core_hub_1...
Going to remove core_chrome_1, core_firefox_1, core_hub_1
Removing core_hub_1...
Removing core_firefox_1...
Removing core_chrome_1...

Integrate with Jenkins

Now to have it integrated with Jenkins for automated testing. I will update this when I have this setup in my environment.


May 8, 2015

How to Clone a Live Production Linux Server with This Cool Technique

Do you find yourself stuck with some old Ubuntu Server that runs some critical application on a Physical Hardware which is running out of resources. You don’t want to waste rack space by cloning it to another physical server. You are a smart guy, you know cloning it to a VM gives you all the benefit of modern day server management.

live clone

What if the original hard drive has couple hundred Gigs of disk space and the actual data is only a few to a few dozen Gigs? You don’t want all that unused disk space to be also cloned into the VM via dd command. Is there a way to clone the server to a smaller disk that lives on a KVM VM guest? What if you can’t risk shutting down the physical server because it is in production and is a single point of failure? Look no further, here’s how you can live clone a old running Linux server to a KVM virtual machine.

Prepare New KVM Guest Image

Step 1. - Log in to your KVM host server and create a qcow2 image file with enough disk space to clone the actual data from the physical server:

qemu-img create -f qcow2 new-vm.qcow2 100G

Step 2. - Let’s attach the qcow2 file to a local device on KVM host so you can partition and format it.

mkdir /mnt/new-vm
modprobe nbd max_part=8
qemu-nbd -c /dev/nbd0 new-vm.qcow2
cfdisk /dev/nbd0
mkfs.ext3 -v /dev/nbd0p1

cfsidk - Create partition. Tailor this to your environment, instruction HERE.
mkfs.ext3 - Format the disk to ext3 filesystem. You can also use mkfs.ext4 or other type of filesystem depend on your situation.

Step 3. - Mount the drive

partprobe /dev/nbd0
mount /dev/nbd0p1 /mnt/new-vm

Clone Data from Physical Server

Step 4. - Rsync (Copy) all the files from the physical server
Do a dry run first.

rsync -e 'ssh -p 22' -avxn root@physical.server.com:/ /mnt/new-vm/

Now do it for real if you don’t see any problem with the dry run.

rsync -e 'ssh -p 22' -avx root@physical.server.com:/ /mnt/new-vm/

Step 5. - Sync data for the last time
Make sure all cached data are written to the hard drive on the physical server by running:


Go back to the KVM host and do final rsync again

rsync -e 'ssh -p 22' -avx root@physical.server.com:/ /mnt/new-vm/

Clean up

Step 6. - Update disk UUID in GRUB bootloader
On the KVM host:


You will get a list of mounted drives UUID. Copy the one that follows /dev/nbd0. Use that value and replace all the disk UUID in /mnt/new-vm/boot/grub/grub.cfg

Step 7. - Clean up /etc/fstab
Do the same for /mnt/new-vm/etc/fstab and replace with the new UUID or just remove all UUID in the fstab file.

Step 8. - Clean up network settings

  • Change IP address by editing /mnt/new-vm/etc/network/interfaces
  • Remove cached mac address from udev. /mnt/new-vm/etc/udev/rules.d/70-persistent-net.rules
  • Edit any application settings that might create conflict with your physical server once you stand the new VM up.

Step 9. - Un-mount the drive

umount /mnt/new-vm
qemu-nbd -d /dev/nbd0

Stand Up the New VM

virt-install \
--connect qemu:///system \
-n vm-name \
--os-type linux \
--vcpus=2 \
--ram 2048 \
--disk path=/path/to/new-vm.qcow2 \
--vnc \
--vnclisten= \
--noautoconsole \

Change the vm-name, vcpus, ram, and disk location to your liking. If everything go smooth, you should see these success messages on screen:

Starting install...
Creating domain...                                                                             |    0 B     00:01
Domain creation completed. You can restart your domain by running:
  virsh --connect qemu:///system start new-vm

You can now try to ssh into the new VM using the new IP and see if things are working. To troubleshoot any issue, you will need to use VNC to connect to your VM and debug accordingly. Congratulations! You’ve now save your company from a legacy dying server to the new, shiny, scalable, replicable, and testable VM. Now this is money ~~~