Category Archives: Virtualization

Backing Up Snapshots For VMware

Written by William Roush on April 17, 2016 at 8:01 pm

Sometimes backing up snapshots is useful, lots of applications don’t do it out of the box… so how are we going to accomplish this?

Why Backup Snapshots?

While everyone would jump on board the “snapshots aren’t backup” train (which I’m a proud member of) there are reasons you may want to backup snapshots. One of the biggest reasons I would is that certain tools like TeamCity can leverage snapshots as checkpoints to boot up your build agents from, and when you go to restore your virtual machines it would be nice to not have to recreate snapshots.

I’m sure there are many other perfectly valid reasons to need long-lived snapshots (especially on non-production development/testing machines), so why not support that recovery mode?

The Problem

As far as I’ve seen, all backup software squashes your snapshots, restoring a virtual machine results in you having a single state, all snapshots are erased. Ouch! We’ll need to get clever.

Doing It With Veeam

This isn’t the best way to do this, but so far it seems to be the easiest. File copy jobs. We want the entire state of the virtual machine, all of it’s files, it’s snapshot descriptor files, everything. Now there are a few downsides to this:

Your backups will be larger – Veeam only backs up the current state, we will be backing up all states. You need to store all the snapshot deltas and any memory snapshots.
File lock issues – We’ll need to resolve issues with backing up a powered-on virtual machine
Storage – We don’t get clean vbk files, we’ll have full copies of whatever exists on the datastore
Versioning – If we want to keep multiple copies over time, we’ll probably want to automate versioning our backup folders folders.
Tape – We lose some visibility pushing backup files to tape instead of the vbks (though this only applies to higher licensing tiers that can leverage that).

Doing it With Powered Off Virtual Machines

Easy enough, we want to pick the folder that the virtual machine is in on the datastore, and back it up. Everything should go smoothly and file locks shouldn’t bite us.

Doing it With Powered On Virtual Machines

This becomes extremely tricky, you need to back up only the unlocked files, this also means that the current state will be trashed (if this isn’t OK, we can automate a NEW snap prior to the job running and commit it on completion). Here are a list of what I’m backing up to test this:

[VM]-000001.vmdk – Our VMDK for our current state (this is bad due to locked file)
[VM]-aux.xml
[VM].vmx – Virtual machine configuration file
[VM]-ctk.vmdk
[VM]-Snapshot15.vmsn
[VM]-000001-ctk.vmdk
[VM].vmdk – Our base VMDK metadata, our snapshot
[VM].vmsd
[VM].nvram
[VM]-flat.vmdk – Our base VMDK with our data on it, our snapshot

These files were locked:

[VM]-000001-delta.vmdk – Our delta file for our current state (after our snap)
[VM]*.lck – Anything with a “lck” extension appeared locked.

When restoring the file we’ll create an invalid delta file to put things into a somewhat OK state, SSH into you hypervisor, navigate to your virtual machine’s directory and type this (replace “00001” with the number of the delta you’re missing):


touch  [VM]-000001-delta.vmdk

This will create an empty VMDK delta file, it’s invalid and your machine will not boot, but from this stage you can revert back to the last snapshot, setting everything in a correct state.

The easiest way to do this would be just to add all files to the file copy job, and let those that are locked fail, a script will handle this best being as Veeam’s UI will not let you multi-select files, and selecting the folder results in a failure of the entire backup on the first locked file.

You miss out on a lot of nice to haves, restoring this involves copying the files back to the datastore (can be done with a file copy job in the reverse direction) and adding the machine to your inventory manually, but you cannot restore to a newly named virtual machine, you’ll have to restore as-is and rename after it’s done. Be aware too: transfer speeds seemed to suffer a lot for this kind of backup setup.

Additionally this has worked under lab conditions, so please, as with any backup test it first! Let me know if it works for you.

If there is enough interest maybe I’ll write up some PowerShell scripts to automate some of the more tricky stuff and post it.

Unable to revert snapshot: “the vendor of the processors in this machine is not the same”

Written by William Roush on April 6, 2016 at 5:24 pm

Warning: This is not supported by VMware, not recommended and I am not responsible for any data loss related to trying this. Snapshots are not backups and you should not rely completely on them. If you’re willing to risk data loss this may however save you… Have backups of the VM’s current state before attempting to do any of this.

On this Serverfault post a user is confused due to EVC configuration. For most people EVC only has to do with clusters and vMotion, however if you snapshot a running VM the VM’s CPU feature flags are set depending on the EVC settings of the VM. So a cold migration may leave you unable to revert the snapshot with the following error:

feature requirements of this virtual machine exceed capabilities of this host’s current evc mode

the vendor of the processors in this machine is not the same

We’re going to go ahead and try to take a live VM snapshot and convince VMware it’s a powered off snap. Sadly in my lab I do not have an EVC enabled cluster up with differing hardware so we’re going to take the best swing we can at it. We’re going to start with a powered on Windows VM, snapshot it while it’s powered on and attempt to remove all traces that the snapshot was taken while it was powered on so hopefully those sticky EVC settings won’t stick.

So we’re going to try to trick VMware into thinking the VM was powered off when the snap happened. There are 3 major differences in these files:

SnapshotTest.vmsd – A ‘snapshot0.type = “1”‘ line that denotes it’s a powered on snapshot
SnapshotTest-Snapshot1.vmsn – Additional binary data in the snapshot config file, may be related to state, likely has CPU flags in here somewhere
SnapshotTest-Snapshot1.vmem – The dump of the RAM onto disk.

The easiest way to attempt to do this is to open up the .vmsd file and remove the type line, and remove and re-add the VM to your inventory, this will trick the hypervisor into thinking the snapshot was powered off and won’t load the vmem file.

However I cannot test CPU flags mismatching in my lab, it’s entirely possible that the vmsn file will still conflict, which would require you to do some file surgery with a powered-off snap file as you base file (very risky).

Deleting the snaps will remove the vmem file even if the vmsd file has been updated to declare the VM as “powered off” during the snap, so cleanup should be easy (always check though, we’re doing funny stuff to VMware).

Cannot Consolidate Disks on VMware

Written by William Roush on April 6, 2016 at 1:27 am

Due to the error: “Unable to access file since it’s locked” details may say something like:

An error occurred while consolidating disks: msg.fileio.lock.
Consolidation failed for disk node ‘scsi0:1’: msg.fileio.lock.
Consolidation failed for disk node ‘scsi0:0’: msg.fileio.lock.

Make sure your VM doesn’t have it’s disks mounted in another VM, in this case our Veeam virtual machine did not release the disks when it was done backing it up for some reason, removing the disks allowed us to consolidate the VM.

How VMware Can Make The Web Client Awesome

Written by William Roush on August 4, 2014 at 12:39 pm

Some pretty basic design principals that would make the web client on VMware awesome, including the ability to make it redundant and supported on free systems!

I was reading this article by Trevor Pott, which does a fairly good job dealing some major problems on VMware’s vSphere web client, and how absolutely terrible it is. However I have some major issues with this article, first of all is no real concrete suggestions on architecture changes (how do we handle the vCenter single point of failure? What about free clients? What about the Flash plugins?). Here I’m going to offer up some suggestions to reaffirm Trevor’s stance that VMware could and should do this better!

A True Single-Page Application

By far I figure one of the easiest ways to resolve all of our issues is a solid single-page application. This is the concept that the website you visit will load all the resources needed on your computer to run without refreshing the page. This is generally done using HTML5 and Javascript, common frameworks include AngularJS and Ember.JS. A giant flash application like the vSphere web client has now doesn’t really count.

How to Handle the API

Some suggestions on how to handle API calls to the hosts/vCenter:

Transparent layer – Have the web server host a JSON based API that gets translated into the API calls to the host/vCenter box. This allows you to have very low overhead calls (as opposed to very noisy SOAP), and allow Javascript to do what it does best (talking in a native tongue instead of using Apache CXF for Javascript clients). This incurs minor overhead on the host running the web server to do the translations, this also effectively creates two web APIs you use (though arguably you wouldn’t support consumption of the JSON API).
Reverse Proxy – This allows you to remove any difficulty with Javascript dealing with cross-port requests, but you’re going to be leveraging something like Apache CXF for the web services.
Direct Communication – vCenter and VMware’s APIs already exist over HTTPS for web services, if you serve up the single-page application from the same domain/port in a hybrid host setup there will be no additional overhead!

In-Browser Remote Console

Now this is the one piece I will detail is pretty experimental, and by all means feel free to fall back to a Flash/Java console, but what I’d really like to see is a true in-browser console, look at solutions like Guacamole which runs a full VNC client in-browser. Of course there may be some barriers here (Guacamole requires server-side code, not sure how much overhead is acceptable on the Busybox management VM on vSphere).

The only feature I can’t think of reproducing in HTML5 is direct device access required for mounting ISOs/USB devices.

Addressing The Single Point of Failure

These thin API layers (or in one case non-existent) allows not only vCenter to support these single-page web applications, but also the individual hosts. Now it becomes safe to completely scrap the old vSphere desktop clients.

Browser Security

Trevor Pott does some hand-waving about security issues on browsers, and then goes in to complain that the problem really relies with Flash and Java Applets. I’d recommend dropping auto-sign on removing all need for plugins and leaving it at that.

The current desktop client embeds Java applets for some 3rd party tools, so to say it’s more secure is silly.

Speed

The old Windows client is imperceptible. Click and the info is there. Expanding a tree just completes in a time frame so short that a human can’t tell there was a delay.

Yeah, I’m not going to stand by this stance at all, the desktop client is a massively bloated slow piece of garbage. It eats a massive amount of memory, is prone to killing consoles and requiring you to play whack-a-mole in your process manager to kill the spawned processes and get it online again.

The web client is slower, but the desktop client isn’t some kind of idea of what we’d want to achieve, that was pretty bad to begin with.

Using PowerCLI it seems like most operations are pretty instant, so it just seems to be entirely overhead on the applications themselves, so a well-written single-page application could easily handle this and be lightning fast.

What’s ultimately the damning element of this is that Internet Explorer is the most common enterprise browser. In many environments, browsers that aren’t Internet Explorer are outright banned.

This is more of a problem with your work environment than the web application itself. If you’re on IE11, things are pretty decent (Javascript is fast, support for modern things is pretty up to date). If you’re at a company that keeps you on IE8 and wont let you install Chrome, that is absolutely no fault of VMware’s.

Other Options and Why I Think They’re Not Good Routes

Native Application

This is going back to the roots of the vSphere desktop client, which generally comes with the same problems (going to be Windows only). I highly doubt VMware will write some GTK+ Windows/Mac/Linux client. So far VMware has still been unwilling to patch a major problem with RVC, so I don’t think they’re giving attention to more “hip” languages like Python and Ruby.

Cross-Platform Application

The next option is planning on a cross-platform application, and I know what they’re going to do: what ever other vendor has done.

Java.

I don’t really think I need to say more, I have a love/hate relationship with Java, but most system admins have just the hate side. Mainly it comes down to writing cross-platform applications can be more costly in languages that aren’t like Java with a nice solid platform.

Mono is also an option, but I have a feeling VMware won’t jump on that boat this early.

Freebies

By far, one of the best parts of major infrastructure decisions: freebies. Additional features or supported platforms with reduced, little or no effort. This list is by no means exhaustive.

OSX/Linux Support

This has been a goal off and on for VMware, obviously fully HTML5 will get you 98% of functionality on OSX and Linux, with minor plugins needed for device management.

Mobile Support

Take that single-page web application, wrap it in a delivery method like PhoneGap, stylize it so that it fits better on the device (different CSS files for phone/tablet), and you’re going to have not just a small subset of features (like most current mobile apps available), but the ability to fully manage your VMware cluster from the ground up.

Overall

There is no reason that VMware should have shipped the web client in it’s current state, nor is it an example of why VMware shouldn’t dedicate resources to writing solid web-based management software, it misses most of the point while throwing all of it’s resources into a dying framework. A bit of design centered around delivering the things customers have been asking for could lead to a product that will put all competitors to shame, instead of turning them away from vSphere.