Perhaps the most astonishing fact, however, is that IT has been blind for so long to the need for monitoring and metering (Auditing) for data health, and yet this fundamental engineering concept. For instance, Figure 1 illustrates a centrifugal steam engine governor.
This device, invented by James Watt, was essential for the safe operation of steam engines. Steam power rotated an axle to which were attached two heavy flyballs. If the steam pressure got too high, the speed caused the flyballs to rise up, and the arms attached to them opened safely valves releasing the pressure. The apparatus could be adjusted to react to a range of pressures. Prior to this invention, overheated boilers simply exploded.
A more up to date example of this principle can be found in a 2009 white paper by Intel entitled Increasing Data Center Efficiency through Metering and Monitoring Power Usage. In this paper, an approach to improving energy efficiency at one of the company’s older data centers in India is described. The authors summarize the project as follows:
“We developed methods for identifying measurable efficiency improvements and placed instrumentation to continuously track power usage effectiveness (PUE), the key metric of data center energy efficiency. Using PUE metrics allowed us to make decisions that increased efficiency, helped achieve optimum data center facility utilization, and provided data we can share with other Intel facilities around the world to proliferate energy savings”.
Several diagrams illustrate this approach; such is the one shown in Figure 2
Figure 2: Metering of Cooling Power at an Intel Data Center
The irony of this is that monitoring and metering of the power consumption of a data center was a priority, yet there is hardly ever any attempt to monitor or meter data. Why this is so is not clear. Perhaps it is thought that edit checks will catch all data errors and that these have all been approved by the users and tested prior to production deployment. Yet in all other engineering constructs where a process is involved, monitoring and metering is an integral part of the production environment. Imagine an oil refinery that had been built to specification and passed its initial test, but which lacked any form of monitoring and metering. It would never be allowed to operate. Yet this is exactly what production data environments are like.
The Nature of Feedback
Monitoring and metering produce feedback. However, it is not true that feedback does not occur in the absence of monitoring and metering. Unfortunately, the kind of feedback that gets produced in the absence of monitoring and metering is often unhelpful and in its own way can be damaging.
Let us continue with the analogy of the oil refinery with no monitoring or metering. Suppose that one of the feed lines sprung a leak. How would this be noticed? Maybe there might be a reduction in an output that was eventually noticed by buyers of the refined products, and led to a search for the cause. Or perhaps someone walking around the site might just happen upon the leak by chance. Or perhaps the leak might be of an inflammable liquid that eventually caught fire – and that would be noticed.
Obviously, these are all undesirable forms of feedback. It would be much better to have gauges for measuring flow, pressure, fluid levels, and so on, and sensors that raised an alarm if measurements reached predetermined danger levels. This kind of feedback is much more timely, usually catches problems before they can do real damage, and point to the location of the problem with some specificity.
There is no such general approach in production data environments. The only feedback that is provided is when users (or customers or regulators) find something suspicious, or wrong, and point it out. Or perhaps IT staff in the course of their daily duties notice a problem and initiate action to correct it.
Not only does feedback have to be aligned to process, but it has to be handled correctly. We are all familiar with images of Network Operating Centers (NOCs), NASA’s mission control, and control rooms of facilities such as nuclear reactors. And of course, here are production control units in IT, who monitor systems consoles for messages about the state of tasks or jobs that are running. However, these units rarely receive information about the data. It is nearly always about whether a task has failed unexpectedly, or has not started, or is taking too long, or is out of SLA. For some kinds of feedback concerning data, it is appropriate for production control to be involved. This may be necessary if a data quality issue critically impacts a process. But other kinds of feedback should be routed to users, or business analysts who support them, or other stakeholders who deal with the data. We simply cannot expect production control units to handle all the feedback that might come from monitoring and metering of data health – it would overwhelm them. The problem is that we have no clearly recognized set of roles and responsibilities for handling such feedback. It is a major challenge for data governance to set this structure up, and it is a challenge that cannot be delayed.
iCEDQ for Monitoring and Metering
If monitoring and metering of data health is essential, what tools will be used to do it? What will be the equivalent of pressure gauges, heat sensors, fluid level monitors and the like?
In thinking about these problems, the IT mindset tends to be dominated by the systems development life cycle (SDLC). A natural response, therefore, is that monitoring and metering should be built into every application. Edit validation checks are often thought of as performing this function. However, in data-centric projects, there is a lot of data movement, and in this context, the data does not have the same relationships with a steward as when a steward is entering data into a screen. Of course, this does not mean that monitoring and metering should not be part of extract-transform-and-load (ETL) environments. Checks should be incorporated into them. However, there are strong reasons for having monitoring and metering as a separate component in the architecture as illustrated in Figure 3.
The Business Rules Approach
From the above, we can see that adaptability is a quality attribute that is needed for any monitoring and metering tool. Data content and structure can change in ways that affect quality, and the tool needs to keep up. In this respect, there is a sharp difference between the monitoring and metering of data health and engineering hardware like pressure gauges and heat sensors. The latter is not really adaptable in the way needed for data.
The adaptability of this kind is not provided by the traditional SDLC. The only architectural pattern that does provide it is the business rules approach. In this pattern, business analysts define business rules and the business rules are immediately executable in a target environment. IT staff, at least in theory, do not participate in this activity, so there is no programming, testing, and production migration.
The kind of tools that provide this functionality is called business rules engines (BREs). Their approach can vary quite widely. Some are interpretive, while others generate code. Some are oriented to natural language rules, while others require the use of scripts that are closer to programming languages. Perhaps the most fundamental split, however, is in their orientation to some kind of deduction, such as deriving a credit score, versus doing simpler calculations and derivations. The latter classes of BREs are more common and are the one that will provide the kind of functionality needed for monitoring and metering.
BREs also serve as independent tools, and so they provided the architectural quality attribute of auditability. They are not a part of the process that is doing the real work of manipulating data. In fact, BREs are quite often used for data quality tasks, although it is rare to find them offered as pure monitoring and metering tools. For instance, they can be a component of data profiling tools.
Notification and Alerting
Gathering and storing metrics as a result of metering is one thing. However, monitoring implies the detection of events. When an event is detected, individuals or applications must be notified. If the event is determined to be adverse, they must be alerted. An alert is intended to trigger immediate action notification is more for informational purposes, although it might be part of a series that shows a developing trend that may lead to problems.
Dashboard: Presentation of Results
Earlier in this paper, the concept of a control room was discussed, and it was noted that there is no real equivalent of this for the production data environment. Notification and alerting only provide a mechanism to act on individual problems, but management will always want to see a “big picture” of the health of data across the production landscape. The architectural approach of monitoring and metering described here does generate the basic measurements of data quality and stores them, but how should they be presented?
A “big picture” approach suggests a high-level representation with drill-down capabilities. Producing detailed reports on thousands (or tens of thousands) of rules executing in an environment every day is unsuitable for senior management, and may even be overload for individual analysts. The “big picture” can be generated using aggregation and/or filtering of the metrics metadata. The way in which is it presented is probably best as a dashboard, such as the one illustrated in Figure 6.
There is a general lack of monitoring and metering of data health that makes it impossible to effectively govern production data landscapes. Of the approaches to monitoring and metering, business rules engines offer the most promise. What we would expect from such business rules engines has been described. However, space has only permitted us to touch upon some of the major points; and much detail has had to be omitted. Nevertheless, the basic elements have been covered, and while some aspects are undoubtedly forward-looking, the domain of monitoring and metering enterprise data within overall data governance is developing quickly and we can expect to see it more clearly addressed by the products like iCEDQ.