Fault Tolerance

VMware Fault Tolerance (FT) is a pioneering new component of VMware vSphere that provides continuous availability to applications, preventing downtime and data loss in the event of server failures. VMware Fault Tolerance, built using VMware vLockstep technology, provides operational continuity and high levels of uptime in VMware vSphere environments, with simplicity and at a low cost.

Key Features

VMware Fault Tolerance automatically detects server failures and triggers instantaneous, seamless stateful failover resulting in zero downtime, zero-data-loss continuous availability.

VMware Fault Tolerance automatically triggers the creation of a new secondary virtual machine after failover, to ensure continuous protection to the application.

VMware Fault Tolerance works with all types of shared storage, including Fibre Channel, NAS or iSCSI. VMware Fault Tolerance works with all operating systems supported with VMware ESX.

VMware Fault Tolerance works with existing VMware DRS and VMware HA clusters and only an additional dedicated gigabit Ethernet network.

For users who require even greater levels of high availability than VMware HA can provide,

VMware vSphere introduces a new feature known as VMware Fault Tolerance (FT). VMware HA protects against unplanned physical server failure by providing a way to automatically restart virtual machines upon physical host failure. This need to restart a virtual machine in the event of a physical host failure means that some downtime-generally less than three minutes-is incurred. VMware FT goes even further and eliminates any downtime in the event of a physical host failure. Using vLockstep technology, VMware FT maintains a mirrored secondary VM on a separate physical host that is kept in lockstep with the primary VM.

Everything that occurs on the primary (protected) VM also occurs simultaneously on the secondary (mirrored) VM, so that if the physical host on which the primary VM is running fails, the secondary VM can immediately step in and take over without any loss of connectivity. VMware FT will also automatically re-create the secondary (mirrored) VM on another host if the physical host on which the secondary VM is running fails. This ensures protection for the primary VM at all times.

VMware Fault Tolerance works in a very innovative manner. Consider the following to understand the workings of Fault Tolerance:

VMware Fault Tolerance, when enabled for a virtual machine, creates a live shadow instance of the primary, running on another physical server.
The two instances are kept in virtual lockstep with each other using VMware vLockstep technology, which logs non-deterministic event execution by the primary and transmits them over a Gigabit Ethernet network to be replayed by the secondary virtual machine.

The two virtual machines play the exact same set of events, because they get the exact same set of inputs at any given time.

The two virtual machines access a common disk and appear as a single entity, with a single IP address and a single MAC address to other applications. Only the primary is allowed to perform writes.

The two virtual machines constantly heartbeat against each other and if either virtual machine instance loses the heartbeat, the other takes over immediately. The heartbeats are very frequent, with millisecond intervals, making the failover instantaneous with no loss of data or state.

VMware Fault Tolerance requires a dedicated network connection, separate from the VMware VMotion network, between the two physical servers.

In the event of multiple host failures-say, the hosts running both the primary and secondary VMs failed-VMware HA will reboot the primary VM on another available server, and VMware FT will automatically create a new secondary VM. Again, this ensures protection for the primary VM at all times. VMware FT can work in conjunction with VMotion, but it cannot work with DRS, so DRS must be manually disabled on VMs that are protected with VMware FT.