Machine Virtualization as a Development Tool: Part 2

Prologue

This is the second of a two article series about using machine virtualization as a development tool. It focuses on the question of how. The first article is found here and focuses on the question of what.

Using VMs

It is time now to get to the specifics of using a VM solution, in this case VirtualBox.

Guest Additions

Guest Additions are vendor specific add-ons that permit some useful features of VirtualBox. The Guest Additions should be installed into the guest OS when creating a base VM. See the VirtualBox documentation for details.

Shared Folders

One of the important features that Guest Additions permits is a shared folder. A shared folder permits the sharing of a host directory to the guest VM. This is a nice way to deploy code or otherwise share content with the VM. If content in a shared folder is stateful, such as code, then it should be managed by a version control system, such as git.

Snapshots

A snapshot saves the state of a VM such that when the snapshot is restored the VM continues exactly where it left off. When a VM is started back at a point in time of a snapshot, remounted network drives and shared folders may have different contents than when the snapshot was made because the content is sourced externally to the VM. This is in contrast to virtual drives which by default will be restored bit for bit.

Snapshots are useful for a developer to test a new release or to rollback a site to an older state to work on a bug-fix. For example, one might be working on a new release of an application, only to have an emergency bug-fix take precedence over work on the new release. It is then a simple task to snapshot the VM to save the state for the new release, then restore a clone of the VM as it existed for the code experiencing the bug, right along with checking out the matching code from one’s SCM, like git. Then when the bug is fixed a similar process is performed to resume work on the new release. This ensures that not only does the code match production, but the other aspects of the host match as well such as the database¹, web server settings, etc.

Taking a snapshot is not a big deal so when in doubt, take one. Always take a snapshot to capture current state before restoring a snapshot. Beyond that, it is a good idea to take a snapshot before making a time consuming change to the VM, and with each code promotion.

It is wise to shut-down the VM prior to taking a snapshot because when one takes a snapshot of a running VM the memory state of the machine is also stored in the snapshot. This extra data is only needed if it is important to restore the snapshot to the running state, and it can occupy as much disk space as the memory defined for the VM.

When making a snapshot, VirtualBox provides a field that can be used to label the snapshot. A description field is also provided. Be sure to use the description field to include any information that is needed to fully restore the state of the snapshot. For example, include the git tag² needed to revert contents of a shared folder.

In the following example, the snapshot is labeled to show that it was taken prior to doing some database maintenance. The description field contains the git data needed to restore a shared folder to the correct condition of the snapshot. The command git log --decorate --oneline -1 is a handy way to get the latest commit information, but ideally one should be creating an annotated tag to go with the snapshot. An annotated tag could be labeled the same way as the snapshot, thus clarifying the snapshot to tag relationship from examination of both the code and the list of snapshots.

The power of a snapshot is the ability to capture a point in time. However, to be able to make full use of the point in time, we may use snapshots with clones.

Clones

As one would expect, cloning a VM is making a copy of a VM. The source for a clone can be a VM’s current state or a particular snapshot.

Let’s consider the following where one is working on a web development project where git is being used to manage the stateful content of a shared folder:

A critical bug is found in production. We must roll-back to a snapshot to work a bug then later resume the current state…

There are two ways to handle this:

Use snapshots

Shutdown the VM so we do not preserve the VM run-time data.

Stash content of shared folder.

 $ git stash save "Working on customer lookup when ticket ID 113245 was received"

Capture the full stash information for use in the snapshot description.

 $ git stash list
 stash@{0}: On dev: Working on customer lookup when ticket ID 113245 was received

Take a new snapshot of the current state so we have a place to return to when the bug is fixed. Include the stash information in the snapshot description.
Restore the VM to the snapshot required to fix the bug. (In this case the snapshot created for the v1.0.1 code promotion.)
Restore the shared folder to the correct place³ for fixing the bug.
```
 $ git checkout -b b1.0.1-113245 tags/v1.0.1 
```
Fix the bug…
Restore the VM to the snapshot made with our stash comment.

Restore the shared folder to the state that corresponds to the snapshot we just restored.

 $ git checkout dev
 $ git stash list
 stash@{0}: On dev: renaming serialized files
 stash@{1}: On dev: Working on customer lookup when ticket ID 113245 was received
 $ git stash pop stash@{1}

Use snapshots with a clone

What if a co-worker can take on the bug fix while you continue to work on your current sprint? If that is the case, then we can simplify the process down to creating a clone VM for the co-worker.

Select the correct snapshot from the snapshots list. (In this case the snapshot created for the v1.0.1 code promotion.)
Right-click the snapshot and select clone, or click clone button (pictured as a sheep).
Name the clone including the source VM name, snapshot name and the ticket ID 113245.
If your co-worker uses the same network as you, then click the Reinitialize the MAC address of all network cards. Note that you may need to boot into the VM and update network settings to reflect the new MAC addresses.
Select a full clone rather than a linked clone, so that the clone can be operated independently.
When prompted for the what parts of the snapshot tree to clone, select Current Machine State. It is confusing but Current Machine State is the current state of the VM at the point in time of the snapshot you selected. This can result in a smaller VM because no other states, and their requisite differencing disk images, need to be incorporated into the clone.

Headless

Once a VM is fully setup with SSH and the other tools I need, I find that the GUI simply gets in the way. There are two easy ways to start a VM without loading its console or its Graphic User Interface (GUI).

Hold down the shift key when starting a VM from the GUI VirtualBox Manager.
Start a VM from the command line with VBoxHeadless --startvm <uuid|name>.

Maximize return on virtualization investment

When using VM software, as with any tool, to be useful it must save more time and headaches than it causes. There are a few things one can do to maximize return on one’s time investment.

Separate source code management from VM management and identify intersections

Keep source code and source code management on the host machine. Share code or deployments with the VM guests by way of shared folders, do not store such things on a VM’s virtual disk. This eliminates unnecessary dependencies on guest VMs by confining code management to the host machine. This does not preclude the use of remote git repositories. It simply ensures that the host is handling those dependencies.

Identify intersections of state between guest VMs and managed code. This means identifying important milestones in code state with SCM tags, such as a git annotated tag. Also identify important milestones in VM state with snapshots. Be clear that each time a tag is needed for the source code, a snapshot should be made in the VM, and vice versa. Keep snapshot descriptions updated with tags that correspond to the intersections of state between the VM and the code. See Snapshots above.

Use pre-built base VMs.

It is my experience that the most time consuming aspect of using VM software is creating the base VM. That said, creating a base VM from scratch is no more complicated than installing an OS on a real machine, and once you have done it once, I find it simpler than dealing with a real machine. A nice short-cut is to use a pre-built base VM, then simply customize it if needed. The ideal place to get a VM would be from whomever created the VM for your production environment, if a VM is used in your production environment. Pre-built VM images are sometimes called virtual appliances. Even though there are many VM software technologies, a format exists called OVF (Open Virtualization Format) that can be used to make a VM image saved in this format more compatible across different VM software providers. This is supposed to make it possible, for example, to use a VM created for VMWare on VirtualBox. VirtualBox has an import mechanism to let one make use of these OVF VMs.

The following are two places with useful VMs.

Oracle: developer-vm
VMWare: virtual-appliances

The place I find the most useful is vagrantbox.es, but to properly use those images we need another tool called Vagrant.

Use Vagrant

Use Vagrant to simplify and accelerate creating your own base VMs. Once setup, it is possible to create a new VM, with a single vagrant up command. The Vagrant build process is automated and controlled by a written configuration, thus making Vagrant VMs reliable, predictable and consistent.

I use Vagrant with Puppet to simplify and unify the provisioning tasks. The following example shows how to use Puppet to ensure that git is installed in the guest VM. Notice that the command does not identify a particular OS or distribution specific package manger, like rpm or yum. Part of the power of Puppet is that it can keep these specifics out of the configuration and out of the way.

package { 'git':
    ensure => installed,
    }

This lets one focus more on addressing the needs for a VM, and less on how those needs must be achieved for a particular OS/distribution.

I created a Habari development VM for GrowingLiberty.com using Vagrant. This implementation can be found in github at the following URL.

github.com/mmynsted/vagrant-centos-php

The details are documented in the Readme.md. This implementation uses a base Vagrant box from vagrantbox.es, showing that one can further simplify provisioning a base VM by using a pre-built Vagrant base box.

My preference is to use Vagrant to create a base VM, then clone it to a new VM that is no longer dependent on Vagrant. This way I am free to continue to improve my Vagrant implementation without adversely affecting actively used VMs. The example above shows how one could use Puppet as part of the original provisioning and then use traditional shell commands from the new clone. Puppet is still installed so one can interact with the VM in the way that seems most natural.

The Vagrant and Puppet combination enables one to quickly translate a VM build investment to new and changing needs.

Conclusion

A virtual machine is a software based abstraction of a physical machine. This abstraction permits reduced maintenance time, improved testing, and making better use of both remote and local machine resources. Using a local VM can become a seamless part of one’s development process, can help one be better organized, and more productive.

Database state would only be saved if the database was served from inside the same VM or if external, one managed its state much like managing state for source code delivered through a shared folder.↩︎
Be sure to use an annotated git tag so the full information is captured.↩︎
Creating a new branch b1.0.1-113245 from tag v1.0.1 on the dev branch.↩︎

Machine Virtualization As a Development Tool: Part 2