Aristotle said, "Quality is not an act, it is a habit." Truer words were never spoken, yet, in an age when we are literally inundated with information, quality is a fleeting expectation of the data used to drive businesses. Information that starts out clean and precise falls into disarray quickly without good data cleansing habits.
If you spend time in corporate boardrooms, the terms governance, stewardship and quality are thrown about liberally. CIOs talk about Master Data Management (MDM) and Data Quality (DQ) as if the organization understands them, and is standing ready to weed out bad information. They often promote tool-based solutions that they believe can magically transform average information workers into all-powerful and all-knowing "Data Stewards."
If vendors could imbue these traits into their DQ tools, the information in our data warehouses and marts would readily provide knowledge, or even wisdom. Yet, the knowledge we seek often remains out of reach. Would-be wisdom remains diminished in masses of inaccurate facts and misleading figures.
So how should your company approach the problems of matching and cleansing data? First of all, we recommend following Aristotle's advice: make data quality a habit. Every Extract, Transform and Load (ETL) operation should be focused on DQ as a first principle. Relegating DQ to a separate step in the process nearly ensures that it will be done incorrectly. To make your ETL processes DQ-aware, you should insist on staging steps that allow you to compare ingress data with statistical norms. This may require an investment in the so-called "persistent staging" model which can add cost to the ETL processes. However, keeping some old data in your staging environments to assist with statistical scoring of new data can add tremendous value, especially to the data matching algorithms that your MDM system depends on.
Secondly, we recommend that you use a liberal approach to letting potentially suspect data into your warehouse while exposing the scoring information to consumers. In other words, if a code doesn't match a known or expected value, you shouldn't automatically discard it. Instead, the data's low score and perhaps even the reason for receiving the bad score should be saved into the warehouse. This liberal approach has one drawback, of course: additional cost for storage. However, storing DQ and MDM score metadata along with your data can pay huge dividends downstream. Data mart builders can use the scoring information to make better decisions about what they load and how they choose to load it. Report writers can conditionally weight facts and dimensions based on scores, too, revealing intelligence from pure knowledge.
Thirdly, make DQ and MDM concepts pervasive throughout your organization. Your information workers already know how poor data can be. Turn the tables and get everyone to begin using the lexicon of DQ and MDM in project planning in meaningful ways. For example, each time a new report is to be developed, make the quality of the information to be used the first topic of discussion. Focus on certification processes that hold people accountable for the quality of information used to drive business decisions and meet corporate objectives. Having a team of so-called Data Stewards with this responsibility sounds like a nice, compact solution. However, it's really everyone's responsibility to ensure that the information used in decision-making is clean, clear and ready to reveal some intelligence. Don't depend on the Data Stewards alone to ensure the high quality you demand.
Last, but not least, focus on the tooling and workflows related to data adoption and correction. Most companies start their DQ efforts here, thinking that third party vendors can provide magic bullets for eliminating bad data. Scoring of quality during ETL helps more, though. Training your staff to respect and use the scoring for every business operation adds more value. The DQ and MDM tools you choose to buy or build are quite important. So concentrate on the integration and workflows that the tools might enable toward your objective of having great data. For example, one tool may be good at scoring data but have a weak pattern matching engine, requiring lots of expensive customization. Another tool may integrate well with your incident management system or provide a slick-looking user interface for your Data Stewards but integrate poorly with your ETL tools. Before investing in DQ tools, do your homework. You must understand your organization and your other systems well before making such an investment. We recommend focusing first and foremost on the integration of the system into your business and the new or modified workflows that could be built.
In closing, another quote from the venerable philosopher comes to mind. "Each man judges well the things he knows." From that thought, it serves CapTech's clients to hire those who know Data Quality and Master Data Management well. We guide our clients through the technical hurdles and the organizational changes required to build data warehouses that will stand the test of time, becoming habitually higher in quality with each new bit of data that flows into them.