A server cluster is a group of independent servers running Cluster service and working collectively as a single system. Server clusters provide high-availability, scalability, and manageability for resources and applications by grouping multiple servers.
The purpose of server clusters is to preserve client access to applications and resources during failures and planned outages. If one of the servers in the cluster is unavailable due to failure or maintenance, resources and applications move to another available cluster node.
For clustered systems, the term high availability is used rather than fault-tolerant, as fault tolerant technology offers a higher level of resilience and recovery. Fault-tolerant servers typically use a high degree of hardware redundancy plus specialized software to provide near-instantaneous recovery from any single hardware or software fault. These solutions cost significantly more than a clustering solution because organizations must pay for redundant hardware that waits idly for a fault. Fault-tolerant servers are used for applications that support high-value, high-rate transactions such as check clearinghouses, Automated Teller Machines (ATMs), or stock exchanges.
While Cluster service does not guarantee non-stop operation, it provides availability sufficient for most mission-critical applications. Cluster service can monitor applications and resources, automatically recognizing and recovering from many failure conditions. This provides greater flexibility in managing the workload within a cluster, and improves overall availability of the system.
Cluster service benefits include:
- High Availability. With Cluster service, ownership of resources such as disk drives and IP addresses is automatically transferred from a failed server to a surviving server. When a system or application in the cluster fails, the cluster software restarts the failed application on a surviving server, or disperses the work from the failed node to the remaining nodes. As a result, users experience only a momentary pause in service.
- Failback. Cluster service automatically re-balances the workload in a cluster when a failed server comes back online.
- Manageability. You can use the Cluster Administrator to manage a cluster as a single system and to manage applications as if they were running on a single server. You can move applications to different servers within the cluster by dragging and dropping cluster objects. You can move data to different servers in the same way. This can be used to manually balance server workloads and to unload servers for planned maintenance. You can also monitor the status of the cluster, all nodes and resources from anywhere on the network.
- Scalability. Cluster services can grow to meet rising demands. When the overall load for a cluster-aware application exceeds the capabilities of the cluster, additional nodes can be added.