I've recently posted a couple of articles at this site on data quality, this is the final one in a series of three. Previous posts presented these ideas:

  • Yes, there is a business case for improving data quality, and we've got business value examples. If you look for real money where you anecdotally know there are data quality problems, you'll likely find it in high costs of data correction and rework, and savings related to business process improvements that reliable data enables.
  • There are distinct things an organization can do to reap benefits of improved data management and data quality. (1) Get started in the first place, (2) find the tangible benefits, (3) cross the departmental silos that exist in every large organization, and (4) promote sound data management practices.
  • Impacts of poor data quality can seem abstract in a large organization. They aren't for a small business owner. Say you have a carpet cleaning business. What if you knew 10% of your customer bills were wrong, but you weren't sure by how much or in which direction? First you'd panic. Then you'd rush to fix the problem.

The nature of the problem is the same for large companies, but the sheer scale and complexity of many overlapping systems seem to diminish the benefits of correcting any one data deficiency. In addition, data quality sufficient for a given operational purpose is often deficient when shared with that of other systems, making the data quality question seem subjective. So data quality improvements need to be ambitious or risk being seen as trivial.

Cameron Snapp adds this thought: "data quality deficiencies are unique to an individual organization based on its own limitations, operational behaviors, and mandates. However, the benefits of implementing data quality initiatives are universal: confidence in IT systems, cost, and repeatable/reliable data practices. There's a common misconception in many organizations that ‘our data quality problems are unique'. This is true, and every delivery is unique too, but the solution approach and outcome are similar for everyone: life is better."

In preparing the two previous posts I thought about parallels between poor data quality and corruption. Both are complex, and to folks in the mix seem like just the "way of the world." I came across this fascinating site called http://ipaidabribe.com. I found a great list of "corruption excuses" that seemed to parallel things I've heard about the difficulties of data management improvements. Brian Cox suggested addition of another excuse to the list: "Don't worry, we'll fix it in the reporting system." That one doesn't have a corresponding corruption excuse on this list, maybe it is just too horrible to contemplate!

I'm not saying bad data habits are evil and dishonest, just that there are similar habits of thought going on with these two pervasive and subtle-seeming dysfunctions. To me whether or not to pursue data quality improvement benefits comes down to a question of hard dollar returns versus doing things the "way we've always done." If you have the organizational wherewithal to correct the systemic problem then the benefits are there for the taking.

Corruption Excuses

Poor Data Quality counterpart excuses

Excuse 1: corruption is everywhere

Excuse 1: bad data is everywhere

Excuse 2: corruption always existed

Excuse 2: the data has always been bad

Excuse 3: the concept is vague and culturally determined

Excuse 3: data quality is vague and subjective

Excuse 4: cleansing will require a whole change in attitudes & values

Excuse 4: cleansing will require a whole change in attitudes & values

Excuse 5: corruption is not harmful; it is the grease that moves the economic engine

Excuse 5: poor data quality is not harmful; our systems are working well now

Excuse 6: Nothing can be done if the top is corrupt and corruption is systematic

Excuse 6: Nothing can be done if we don't have top management support

Excuse 7: Don't worry, with free markets, it will eventually go away

Excuse 7: Over time the data quality will improve because our source systems are better now