Description

The Cloud model is built upon a highly redundant infrastructure but single points of failure still exist. This article describes what happens if an ESXi chassis supporting your Cloud Server within the infrastructure fails.

Content / Solution:

The Cloud infrastructure model is designed to maximize redundancy. Bonded NICs, RAID volumes, multiple SAN paths, redundant power supplies, and a Tier III data center infrastructure have been implemented to maintain virtual server availability in most failure cases involving a single component. However, there are some hardware components involved in the deployment that cannot be implemented redundantly, such as the server motherboard, physical CPU, or RAM DIMM. Therefore, it is still possible for an the physical server hosting your Cloud Server to fail.

Each Cloud Server can run on only one physical server (the ESXi "host") in a VMWare cluster at any given time. If that chassis completely fails, then all Cloud Servers running on that chassis will also fail in conjunction with the hardware. VMware's High Availability feature detects  such ESXi failures and will automatically restart each running virtual machine on another chassis in the cluster. However, this will result in a few minutes of downtime while your server is moved and restarted. 

Although this scenario is rare, keep in mind this possibility when designing a configuration for maximum availability. Deploying virtual server redundancy using the MCP 1.0 VIP function (see What is a VIP in a MCP 1.0 Data Center Location) or MCP 2.0 Virtual Listener function (see Introduction to Virtual Listeners / VIPs in MCP 2.0) can minimize the possibility of issues during such an event. In addition, the Anti-Affinity feature will prevent any two Cloud Servers from sharing the same physical host. For more details, see Introduction to Server Anti-Affinity