Failover Tests verify of redundancy mechanisms while the system is under load. This is in contrast to Load Tests which are conducted under anticipated load with no component failure during the course of a test.
For example, in a web environment, failover testing determines what will happen if multiple web servers are being used under peak anticipated load, and one of them dies.
Does the load balancer react quickly enough?
Can the other web servers handle the sudden dumping of extra load?
Failover testing allows technicians to address problems in advance, in the comfort of a testing situation, rather than in the heat of a production outage. It also provides a baseline of failover capability so that a 'sick' server can be shutdown with confidence, in the knowledge that the remaining infrastructure will cope with the surge of failover load.
The following is a configuration where failover testing would be required.
This is just one of many failover configurations. Some failover configurations can be quite complex, especially when there are redundant sites as well as redundant equipment and communications lines.
In this type of configuration, when one of the application servers goes down, then the two web servers that were configured to communicate with the failed application server can not take load from the load balancer, and all of the load must be passed to the remaining two web servers. See diagram below:
When such a failover event occurs, the web servers are under substantial stress, as they need to quickly accommodate the failed over load, which probably will result in doubling the number of HTTP connections as well as application server connections in a very short amount of time. The remaining application server will also be subjected to severe increase in load and the overheads associated with catering for the increased load.
It is crucial to the design of any meaningful failover testing that the failover design is understood, so that the implications of a failover event, while under load can, be scrutinized.
After verifying that a system can sustain a component outage, it is also important to verify that when the component is back up, that it is available to take load again, and that it can sustain the influx of activity when it comes back online.
Send mail to
questions or comments about this web site.