Before anyone proceed with creating a Failover Cluster, configuring network storage, configurations and these “technical” and “nice” manipulations, it’s a must to understand what is a failover cluster, how it stands and what is it’s logic. I’m sure that after reading this post, you will all say (for whom that didn’t know) : waw, it’s a nice and simple idea !! Keep in mind that understanding how a cluster works will make understanding all the configuration stuff so easy.
1- What is a cluster
A cluster in a windows vision, is a bunch of resources that works together to finally give a highly available resource. As an example, a Hyper-V cluster is a group of Hyper-V servers (Hosts) that works together to give highly available virtual machines. A SQL Server failover cluster is a group of SQL Servers that works together to give highly available SQL Server instances.
So, the aim of the cluster is to bring a resource (or more) to a highly available state. Highly available state doesn’t mean, always available, but means the cluster will do its best to make it always available (don’t be mad when a VM come offline or fail to be brought online, no Coperfield over there).
2- How does the cluster works : Magic or logic
This is the most important part because if we understand how it works, we understand how to build, how to maintain and how to troubleshoot. My explanations will not be deeply technical (no protocols, no ports, no packets…) but more logical. Believe me, technically understanding how the cluster works will need more than a blog post 🙂
The ultimate example is the Hyper-V failover cluster. i will explain piece by piece what we have and what we got.
- I have a single Hyper-V server with two virtual machines on it
– For the physical server : The vital components for the system (OS) to run are: RAM, CPU, HardDisk and Network. we can see in the picture the presence of these components : RAM , CPU, Local Storage, NIC
– Now, the virtual machine is like a physical one. It’s an Operating System that needs to run, and to run it needs the vital components : RAM, CPU, HardDisk and Network. For that, the VM will allocate (borrow) some resources from the physical resources : It will borrow some piece of RAM, some capacity from the CPU, some storage from the local storage and will use the physical NIC to access network. This will be the same for each virtual machine that will run.
——–> For the virtual machine to run it needs: RAM, CPU, VHD (virtual Hard Disk stored on the local storage as a vhd file) and access to the network.
———> If the physical RAM fails (hardware issue for ex), the virtual machine will fail (dependency). If the local storage fails, the VHD will be lost and the virtual machine will fail (no disk). So, if a dependent physical resource fails, the virtual machine fails
——–>If Hyper-V fails (service, bug) or the OS fails (bug, crash…) the VM will fail (Hyper-V is VM’s engine)
The lesson is : If you want to make your resource highly available, make sure all its vital components are made highly available
So in our case, the Hyper-V cluster will try to make the vital components highly available (RAM, CPU, Disk (data), network), let’s imagine such scenario.
In the previous picture, we have added a second Hyper-V server (also called node or host) exactly like the first node. we attached a remote storage (SAN like Storage Area Network) to both nodes and we shared the storage so both nodes can access (read/write) it in the same time. The VM storage is now on this shared storage (VM storage = VM Configuration file + VHDs + other files that we will discuss later in this blog serie), the VM configuration file contains information about the VM configuration on the Hyper-V level (Name, Memory, CPU count, VHDs locations…) and VHDs are the Virtual Hard Disks that contain the OS and the Data.
The verdict 🙂
Imagine that our VM is running on the first node : Memory content is on the first node RAM, CPU instructions are ran on the first node CPU, network traffic is passed via the first node NIC and the VM storage is located on the shared storage.
What happens if the first node fails!!!
The VM will crash : No memory, no CPU, no network. It’s like we have a physical server and we unplugged the power-supply cable. In the case of the physical server, what can we rapidly do : We will bring a new server with the same configuration, we will un-mount the disk from the faulty server and replace it on the new server, the we will start the server 🙂
The same logic will be done with the VM, but this time the cluster mechanism will do it automatically:
1- It will detect that the host has failed, or the VM has failed, or any resource marked as highly available has failed, in our case the host.
2- it will load the VM configuration from the VM configuration file located on the shared storage (yes, this why we need a shared storage), then register the VM in the second server.
3- it will start the VM and the VM is now up and running. (this step is done for all the HA virtual machines)
The result is depicted in the following picture:
Yes, it is. The cluster concept is simple, the difficulty is to build an efficient, strong and stable system that can do it.
The Windows Failover Cluster is a concept that exists on Windows Server since Windows Server 2003. This concept has evolved continuously to reach today an excellent level with Windows Server 2012.
The same concept can be applied to SQL Server and any role that support clustering. Just don’t forget the concept