Part 6 - Real-Time Business Intelligence

Yesterday’s Data is No Longer Sufficient

Data warehousing solves a strategic need of the enterprise. It manipulates massive amounts of data via data-mining to derive new information and knowledge of an enterprise’s operations. However, data warehousing has little tactical value since the data in it is generally quite stale. It could be days or weeks old. Data warehousing’s primary value is in supporting strategic goals such as reducing costs, increasing sales, and improving profits. EAI solves a tactical need of the enterprise. It allows systems to react immediately to events generated by other systems. However, EAI has little strategic value since it provides no repository of data suitable for data mining.

A solution is needed that satisfies both the strategic and the tactical needs of an enterprise. A store, for instance, needs to know that men who purchase diapers on Saturday also tend to buy beer at the same time (strategic – put the beer near the diapers, and have beer sales on Saturday). A credit card company needs to know that a credit card being used to purchase an item in New York City was used thirty minutes ago in Amsterdam (tactical - deny the transaction, and put a credit hold on the card).

Another example of the use of real-time business intelligence is customer support. For instance, a customer often needs to know if he has enough credit or cash on his credit card or in his debit card account before attempting to make a purchase. Some card companies allow a customer to get preauthorization for specified amounts via cell phone to ensure that the purchase is covered. This preauthorization means that the card company must have up-to-date balances for its customers. This type of customer support is real-time business intelligence.

Today’s Business Intelligence Needs

The gap between analytical and operational processing is closing fast. Just as Moore’s Law continues to characterize the rapid pace of technology development, the complex data-mining queries that used to take hours to run now execute in seconds. If only these data-mining engines had the latest values of the data, the tactical and strategic needs of business intelligence could be merged into a single solution. There are two primary impediments to effective and efficient real-time business intelligence: data latency and data unavailability.

Data Latency refers to the staleness of data. The value of data degrades rapidly with its age. When people are relying on real-time business intelligence to tactically help them with on-the-spot decisions, the freshest data and the fastest response times are needed.

Data Unavailability is a death knell for real-time business intelligence. If a company’s operations have progressed to the point that they are dependent on real-time business intelligence, the unavailability of this intelligence due to a failed system could bring operations to a halt. Extreme availability of the real-time business intelligence services is paramount.

A business cannot respond to events as they happen if it cannot find out about these events for hours, days, or weeks. It also cannot immediately respond to events if the system that supplies the analyses of these events is down. If the problems of data latency and data availability are solved, then businesses react proactively to new information and knowledge rather than reactively. These problems are solved.

Real-Time Business Intelligence Systems

OBI systems represent a significant improvement in latency since actions are taken within hours of the events that triggered them. However, they do not meet the criterion of immediacy that allows a business to react in real-time to an event. For instance, an OBI system does not generate an offer to a customer while he is checking out at a cash register. Nor does it deny a potentially fraudulent transaction before it is executed.

An OBI system is needed that responds to events in seconds or less. However, this response is not done by updating the OBI database with hourly minibatches. Rather, the database must be updated with transaction activity in real-time as it occurs. This type of update is called trickle-feeding the database. As transactions are received, they are stored and become a growing historical record of activity. Furthermore, there must be a very fast rules engine that analyzes incoming transactions against the historical database and makes decisions quickly enough so that immediate action of value to the enterprise is taken. This system is real-time business intelligence (RTBI).

We make a distinction here between a data-mining engine and a rules engine. Both require an historical database. However, a data-mining engine looks for relationships in the historical data to reactively support decision making after the fact. A rules engine compares a real-time event to the historical data to proactively suggest an action to be taken.

Figure 8 shows the extension of the fraud detection OBI system of Figure 7 to an RTBI system that reacts fast enough to cause a suspicious transaction denial before it is consummated. In this example, ATM and POS transactions are fed to the RTBI system as they are generated. The RTBI system posts each transaction to its database for use by a rules engine that is triggered by the new transaction. In real-time, the rules engine checks this transaction against recent activity on the credit or debit card and makes an instant determination of suspicious activity. It then generates a message indicating whether the transaction should be accepted or denied.

This particular fraud-detection example is actually used in real life, as shown in Figure 9. ATMs and POS machines are generally serviced by some particular bank. However, the credit or debit card used was probably issued by some other bank. Therefore, the card transaction must be sent to the issuing bank for authorization. The issuing bank verifies that the card balance is sufficient and if so authorizes the transaction.

A Fraud Detection RTBI System

In this case study, a bank service company operates a transaction switch that receives card transactions from the servicing banks operating the ATM or POS machines and forwards these transactions to the appropriate issuing banks. Upon receipt of an accept/deny message from an issuing bank, the transaction switch returns this message to the servicing bank, which then takes appropriate action to accept or deny the transaction.

Fraud Detection: A Real-Time Business Intelligence Example

As an additional service, the bank service company provides a fraud detection service. At the same time that it forwards the transaction to the issuing bank, it also sends the transaction to its RTBI fraud-detection system. If this system determines that the transaction is suspicious, the bank service company takes several actions as requested by the issuing bank. On the one hand, it denies the transaction and informs the issuing bank. Alternatively, it alerts the issuing bank to the circumstances of the suspicious activity so that it can decide whether or not to put a hold on the card. This system shows RTBI in action. The RTBI fraud detection system uses complex rules against an historical database to determine which actions to take that directly affect a transaction in progress.

RTBI Dashboards

Real-time business intelligence dashboards are used to bridge the gap between operational business intelligence and real-time business intelligence. For instance, Figure 10 shows an IT server network-monitoring dashboard. This dashboard displays not only historical information but also shows the current status of the server network. The dashboard is interesting because it performs all three business intelligence-functions – strategic, operational, and tactical.

From a strategic viewpoint, it shows bandwidth usage and connection quality over the last several weeks. By clicking on the History tab, further historical usage and quality data is displayed. This dashboard forms the basis for future planning of network upgrades. From an operational viewpoint, the dashboard shows the current status of the network, the current memory usage, and the current connections and their traffic. If a network problem occurs, operational staff could immediately take remedial action. From a tactical viewpoint, as problems are detected, the RTBI system driving the dashboard could issue audible, email, or cell phone alerts to the operations staff. The RTBI system could even take automatic remedial action such as rerouting network connections, shedding low-priority traffic, or invoking redundant connections. The question that remains is how are transactions fed to the new RTBI system from the enterprise’s other systems? This facility is the role of online exact, transform, and load.

IT Server Network-Monitoring Dashboard

Online ETL

We have described offline ETL previously. It is the facility that allows data to be extracted from a source database, transformed into a common format, and loaded into a target database (the data warehouse’s database). Contemporary ETL facilities are batch-oriented and run periodically. They are therefore characterized as being offline ETL facilities. We have also described Enterprise Application Integration. EAI exchanges current information among systems in an application network but provides no historical record of enterprise activity for strategic-analysis purposes.

Real-time business intelligence needs an online ETL facility that not only preserves historical strategic data but also provides current tactical data. The online ETL’s job is to create and maintain a synchronized copy of a source database on a target database (the RTBI system) while the source database and the target database are actively updated and used by multiple applications. In effect, as transactions occur in the enterprise, they are trickle-fed to the RTBI system in such a way that this activity is transparent to other ongoing operations. As with EAI, three methods can be used to create an online ETL facility – connecting via adapters, using message-oriented middleware, and synchronizing via low-latency replication engines.

Adapters: Early adaptations of RTBI used an extension of existing EAI technology. Adapters were used to interconnect enterprise systems with the RTBI system. As transactions were executed by an external system, the results of those transactions were communicated to the RTBI system via adapter connections. The adapters also serviced requests from external systems and returned the RTBI system replies to those systems.

However, adapter technology suffered from the same problems that it faced in EAI applications. It was invasive and often required application modification in order for the applications to interface with the adapters. In addition, adapters were specialized to the applications. Each adapter knew the proprietary formats of the application data structures and how to interface to its application and was thus custom designed for that application. Every time the application changed, the adapter was modified. Consequently, not all applications could participate in the online ETL function.

Message-Oriented Middleware: MOM also exhibited many of the same problems that plagued adapters. MOM was invasive and required changes to the application to send and receive appropriate messages in a common interconnect data format. In order to make application changes, access to the application’s source code was required. This source code was often lost or was the proprietary property of a third-party vendor and was therefore not available. Additionally, every time the application changed, the MOM code potentially had to be modified also.

Data Replication: Data replication solves the adapter and MOM problems of invasiveness and specialization in an RTBI environment. A data replication engine (Figure 11) is noninvasive and is application-unaware. It deals only with the database and is therefore isolated from the application by the database. Today’s replication engines support most relational databases and many nonrelational databases as well.

Data replication is synchronous or asynchronous. However, in RTBI applications, asynchronous replication is generally used, allowing the replication activity to be totally transparent to the application. The application proceeds with no knowledge of or impact from the replication activity. The application’s database activity is extracted by the replication engine via a transaction log, triggers, or intercepts. Selected updates are sent to the RTBI target system by the replication engine, where they are applied. The replication latency interval (the time that it takes to propagate a source database change to the target database) is measured generally in subseconds.

Data Replication

Replication engines support rules for data transformation. Some rules are built-in, and others are specified by user-supplied routines. Consequently, data transformations of any kind are implemented without modifying the applications, the databases, or the core replication engine.

Another requirement of replication engines is that they must preserve transactional consistency. Therefore, source updates are not randomly applied to the target database. Transactions must be applied in the same order to the target database that they were made at the source database. An additional problem occurs when multiple sources simultaneously update the same data item, called a data collision. For instance, what happens if two enterprise systems update the same customer address with different data at the same time? Which one is correct?

Data collisions may happen with any asynchronous mechanism used to feed the RTBI system, whether it is replication, adapters, or some other form of messaging. (Data collisions do not occur if synchronous replication is used.) Replication engines are particularly adept at handling collisions. Most detect a collision and resolve the collision via specified rules. Some of these rules are built-in, and more application-dependent rules are added via user-supplied routines.

Another problem is associated with target-system downtime. A target system fails, and during its downtime other external applications are trying to send it data. If an application does not get a response to data that it has sent, with many other techniques the application may fail or its data may be lost. However, replication engines queue this data and will send the data updates to the system when it is returned to service. As discussed below, in the unlikely event of an RTBI system failure, the replication engines in each enterprise system queue their data changes and send them to the RTBI system when it is returned to service.

A similar problem occurs when a communication connection is lost. With data replication, a system that loses its connection with an RTBI system continues to operate in so-called split-brain mode. It is unaffected except that it does not send its data updates to the RTBI system, and it does not receive updates or recommended actions from the RTBI system. When the connection is restored, all of the updates that accumulated in either direction are sent to the opposite system via the queues maintained by the replication engines. In split-brain mode, each system applies updates to its local copy of the data invisibly to the other system while the connection is down, so there are bound to be data collisions. These collisions are resolved on-the-fly by the replication engines after recovery of the network and during the resynchronization of the databases.

Online Copying

Another requirement for online ETL is an online copy utility. The online copy utility is needed to bring an RTBI system into operation. An up-to-date snapshot of the data in the various enterprise systems that will feed the RTBI system must first be loaded into the RTBI system before it becomes effective.

This load must occur without affecting the source systems since they are busy running the enterprise. It must include all the various changes that occur during the copy, which could  take hours or even days to complete. Furthermore, the copy must include all of the transformations that were otherwise made by the online ETL facility in order for the initial RTBI database to properly reflects the state of the enterprise.

Extreme Availability

Once an enterprise gets wedded to real-time business information, it suffers gravely should it lose this capability. Instant reactions that made it competitive and efficient are suddenly lost. Therefore, extreme availability of the real-time business information system is of paramount importance.

The first step is to choose an architecture that is especially resilient to failure. NonStop® systems from HP® are an ideal solution. These totally redundant systems provide proven availabilities in excess of four 9s (less than one hour of downtime per year). Clusters of highly reliable industry-standard servers are another choice. They are configured to provide availabilities in the same range as NonStop systems.

However, though these solutions give reasonable protection against single component failures, they do nothing for disasters that take out an entire data center. Therefore, the RTBI system must be backed up by a geographically remote site that takes over in the event of a primary site failure. Otherwise, it might take days or weeks to replace the system, during which time normal business operations are severely impacted.

Backup systems using magnetic tape to rebuild the database take days to recover and are unsuitable for RTBI system backup. Virtual tape eliminates some of the problems associated with magnetic tape, but virtual tape systems still take a prohibitively long time to recover. Data replication to a backup site provides a reasonably complete copy of the database. However, following a primary site failure, the database must still be brought to a state of consistency, the applications started, the database opened, and the system tested before it is returned to operation. These steps take some time (minutes to hours) and are hindered by the same problem that the other backup strategies face – is the backup system working when it is brought into operation?

The best “backup” for an RTBI system is an active/active system. An active/active system comprises two or more geographically-dispersed nodes that are already up and running. Each node actively processes and shares the application load with the other nodes. Should a node fail, all that needs to happen is to switch transactions (or users) from the failed node to the surviving nodes, a switch that only takes seconds. The advantages of having an RTBI active backup are twofold:

  • Failover is accomplished in seconds. Users of the real-time business information facilities may not even know that a failure has occurred.
  • The failover process itself does not fail. Since the backup system is already up and running, it is known that it is fully operational.

It was noted earlier that there are two impediments to real-time business intelligence – data latency and data unavailability. As discussed earlier, online ETL (and especially data replication) solves the problem of data latency. Active/active RTBI systems solve the problem of data unavailability.

Real-Time Business Intelligence Pages

To read our white paper on Shadowbase and RTBI: The Evolution of Real-Time Business Intelligence

To read our case study on Shadowbase and RTBI: Real-Time Credit and Debit Card Fraud, a Shadowbase Real-Time Business Intelligence Solution