Case Study: Major Bank Uses Active/Active to Avoid Hurricanes

Source and Target

Headquartered in the Midwest, a major U.S. bank serves much of the eastern United States plus Michigan, Ohio, Kentucky, Indiana, Illinois, and West Virginia. It engages in five main businesses:

  • branch banking
  • consumer lending
  • commercial banking
  • investment advising
  • processing solutions

The bank’s roots go back well over a century. It first opened in 1863 and then grew by acquisitions and mergers to become a major force today in the banking industry.

Card Authorization Services

As part of the bank’s processing solutions business, it provides credit and debit card processing for its merchant customers. The bank feels that these services must be highly reliable and available. The services must survive any system failure, no matter the cause, with rapid failover time. This is important because if these services should fail, then users would be denied the use of their credit or debit cards for the duration of the outage.
Therefore, the bank decided to go with highly reliable HP NonStop servers in a two-node active/active configuration to provide these services. One node is located in St. Petersburg, Florida, and the other is located in Grand Rapids, Michigan. This geographical separation ensures that no single environmental disaster, man-made disaster, or system failure will take down both nodes.
Though each node normally handles only one-half of the total processing load, both nodes are configured to handle the entire load so that full transaction processing can continue unimpeded in the event of a node failure.

Active/Active with Customer Partitioning

In the bank’s active/active configuration, both nodes are always active so long as they are properly functioning and the network connecting them is operational. Both are providing the same set of debit and credit card transaction services to their merchant customers.
In active/active systems, data collisions can occur if two users at two different nodes attempt to update the same row or record at substantially the same time (within the replication latency period). To prevent data collisions, the bank splits its merchants between the two nodes. Roughly half of the merchants are assigned to the Florida node and half to the Michigan node. Each node has its own set of IP addresses, and each merchant is assigned the IP address of its primary node to use for transaction processing. However, the network is also configured so that each merchant can switch its IP address to the alternate node. In this way, that node can be used as a backup in the event that the merchant’s primary node fails.

As transaction processing occurs, database changes at each node are replicated to the other node via the Shadowbase asynchronous bidirectional replication engine. Thus, each node contains the entire database for the application network.

Since Shadowbase replicates on a transaction basis, the results of each transaction are committed at the backup database by Shadowbase as soon as that transaction commits on the source database. This keeps the nodal databases in transaction synchronization.
Because the work of each customer merchant is being done on only one node, no data collisions can occur as a result of the asynchronous data replication. That is, in no case will one node be making a change to a row that is also being changed by the other node during the replication latency interval.

Failover

Should a node experience a failure, the merchants that were assigned to that node as their primary node simply switch their network routing IP address to that of the alternate node. Processing for the failed-over merchants continues on their backup node, uninterrupted from the last completed transactions. Processing continues as usual for those merchants who are primaried on that node.

The bank uses the ease of node failover to its advantage to avoid potential disasters. For instance, whenever Florida is threatened by a hurricane, the bank will instruct all of its customer merchants assigned to the Florida node to switch over to the Michigan node until the hurricane threat has passed.
In 2005, during a particularly intense hurricane season, the bank did this five times. As a result, it avoided any potential system downtime due to the devastating hurricanes Dennis, Katrina, Rita, and Wilma, each of which pounded Florida with sustained winds of over 130 miles per hour, causing a combined total of over $120 billion in damage.

Home | Exec. VP Letter | Gravic Labs | Sales & Marketing | Case Study | Product Development | Trade Shows | Employee Spotlight