by David Loshin
Published in TDAN.com January 2003
Every organization has a problem with data quality – there is no doubt to that. The issue that many organizations are currently grappling with is not the existence of data quality issues, but rather how critical those problems are to the business. Even in companies that recognize the importance of data quality within the enterprise, there is hesitancy to attack these problems. Having spoken to a number of information practitioners, it appears to me that the reason for this is not a misunderstanding of the scope or size of the problem, but rather this is due to two fundamental problems:
• The inability to convince senior management of the importance of improved data quality, resulting in limited budgets and lack of governance.
• The size of the problem invokes a paralysis in that no practitioners can figure out where to begin to address it.
An approach that we have had some success with in dealing with both of these issues revolves around a limited assessment of data quality that helps yield some metrics on the size of the data quality problem, help determine which problems are the most business-critical, and help determine the initial steps that need to be taken to address the problem.
The DQ ROI Problem
One of the most frustrating issues associated with data quality improvement is not knowing how bad data really affects the organization. In some companies, the methods to address poor data quality may incorporate interim data corrections, nominal customer service adjustments, multiple copies of non-standard data, or other “topical” solutions, all of which incur some “correction” costs to the company but probably do not address the source of the problem. In fact, though, it is possible that despite the existence of data that is not compliant with knowledge worker expectations, the costs associated with fixing the problems overwhelm the aforementioned correction costs.
This is where the concept of the data quality Return On Investment (ROI) assessment comes in. The goal is to provide some set of metrics that highlight the more critical data quality issues, and tie those issues to actual business problems, which can either be related to increased costs or with lost opportunities. Calculating the scope of the actual costs of those business problems and then comparing those costs with what it will take to improve data quality provides that elusive ROI model.
This ROI model can be used to address both of the above-mentioned roadblocks. By providing clearly defined metrics and their actual measurements, and tying them to actual business problems, this ROI model builds the argument for senior-management support of data quality improvement initiatives. And by highlighting the most critical data issues, this model provides a starting point for the improvement process.
Costs Associated with Poor Data Quality
We can divide the costs associated with poor data quality into the soft costs, which are clearly evident but yet hard to measure, and the hard impacts, whose effects can be estimated and measured. Ultimately, the level of data quality rolls up to the company’s bottom line – allowing low levels of data quality to remain will lower profits, while improving data quality should increase profits.
Hard costs are those whose effects can be estimated and/or measured. These include:
• Customer attrition
• Error detection
• Error rework
• Error prevention
• Customer service
• Fixing customer problems
• Delays in processing
• Delayed or cancelled projects
Soft costs are those that are evident, clearly have an effect on productivity, yet are difficult to measure. These include:
• Difficulty in decision making
• Time delays in operation
• Organizational mistrust
• Lowered ability to effectively compete
• Data ownership conflicts
• Lowered employee satisfaction
These costs can be manifested as one or more of these specific impact classifications:
• Detection costs, which are incurred when a system error or processing failure occurs and a process is invoked to track down the problem.
• Correction costs, which are associated with the actual correction of a problem as well as the restarting of any failed processes or activities. The amount of time associated with the activity that failed, along with extraneous employee activity, are all rolled up into correction costs.
• Rollback costs, which covers costs associated with undoing work that has already been done.
• Rework costs, which represent all work that was additionally performed before the successful run took place.
• Prevention costs, which are those that are incurred when a new activity is designed, implemented, and integrated to identify data quality problems and to take the necessary actions to prevent operational failure due to unexpected data problems.
• Warranty costs are those associated with both fixing the problem as well as compensating the customer for damages.
• Reduction, which occurs when a customer’s reaction to an organization’s data quality problem results in a decision to do less business with that organization.
• Attrition, which occurs when a customer’s reaction to poor data quality results in the customer’s complete cessation of business.
• Blockading, which occurs when a customer’s dissatisfaction is so complete that it causes other potential customers to decide against doing business with the organization in the first place.
Value Associated with Improved Data Quality
Improved data quality can add to the company’s bottom line, either through optimization in operational systems or by improving the value of knowledge generated through a business intelligence process. The following kinds of improvements are typical as the result of improved information quality:
• Improved throughput for volume processing – By reducing the delays associated with detecting and correcting data errors, and the rework associated with that correction, more transactions can be processed, resulting in greater volume processing and lower cost per transaction.
• Improved customer profiling – Having more compliant customer information allows the business intelligence process to provide more accurate customer profiling, which in turn can lead to increased sales, better customer service, and increased valued customer retention.
• Decreased resource requirements – Redundant data, rollbacks, and rework put an unnecessary strain on an organization’s resource pool. Eliminating redundant data, and reducing the amount of rework reduces that strain and provides better resource allocation and utilization.
• Predictability in project planning and completion – If many projects (such as business intelligence and data warehousing projects) fail as a result of poor data quality, then conversely, improving the quality of information should ensure that a project cannot be delayed or canceled as a result of bad data.
Creating The Value Proposition Through DQ Assessment
One might think that a complete system assessment is required to provide the comprehensive ROI for improved data quality, but typically a constrained assessment is both effective at isolating significant problems that can be directly related to increased costs as well as providing insight into the direction an improvement process should take. We have had success in small-scale analyses that focus on one particular data set. The process is as follows:
1. Profile – A data profiling scheme will look at standard column and cross-column analyses, such as frequency, nullness, cardinality, etc. This process will usually expose some of the more offending data quality problems, which should be summarized for the next phase:
2. Review – The results of the profiling should be discussed with the business client to determine which of the issues are related to business problems. We now configure business rules that characterize our expectations as a prelude to actual measurement.
3. Measure – We now have a list of critical data quality problems as well as the business problems to which they relate. The next step is to not just measure information compliance to our defined business rules, but also to measure the actual costs associated with noncompliance.
4. Characterize Solution – Having measured data quality rule compliance and associated noncompliance with the actual costs, we must now propose solutions to these problems and provide cost estimates for those solutions, as well as estimate the increase in value when these problems have been solved.
5. Build the ROI model – We now have enough information to complete our ROI mode, which enumerates costs associated with poor data quality, the costs to correct the problems, and the benefits provided as a result of improvement. This gives us a high-level view of the value proposition for data quality improvement.
Building the ROI model for data quality is a valuable business process that requires a small investment in time and energy yet provides valuable documentation of the scope and costs associated with poor data quality. Performing a limited assessment of data quality yields important metrics on the size of the data quality problem, helps highlight those problems that are the most business-critical, helps determine the initial steps that need to be taken to address the problem. Most importantly, the ROI model provides an irrefutable argument to convince senior managers of the importance of information compliance.