After attending the Strata Hadoop data conference in New York City this month it came to my attention that Data Governance has become a popular topic among organizations due to data now being viewed as a valuable asset. Data Governance has been broadly defined as how an organization uses, finds, accesses and secures its data. Over the course of the conference four distinct questions were asked repeatedly when this topic was discussed. Below we will explore these four questions and the answers that came out of these discussions.

Why is there a need for it?

Data Governance is needed in order to gain an understanding of what data an organization has, who can access it, and how to catalog it. By achieving this it allows an organization to understand the concerns and implications of their data and make appropriate decisions with it.

Once Data Governance is established an organization will be better equipped to handle many different scenarios. Some examples include: audits, compliance with regulations, understanding what data is available, and accountability as to where data is coming from.

What is the current state?

Data Governance has become a sizable issue that organizations are unclear how to solve. In many instances, an organization's data lake becomes a data swamp due to the lack of structure, access control, and knowledge as to what data it contains and where it came from. This results in a growing risk for the organization and the data providing less value.

What is the goal?

The goal for Data Governance is to track lineage, who touched the data and where it came from. Ultimately, organizations should be able to answer the following questions with their Data Governance:

  • What data do we have?
  • How did we get this data?
  • Can we track who has access and who touched the data?

It is important to note that everything cannot be solved with tools. Having the technology is the first step however, both people and processes need to be kept in mind in order for Data Governance to be successful.

What does the future state look like?

​In the future, Data Governance will be built into how an organization operates instead of being separated out. Organizations will need to establish Data Governance stewards for promoting and advocating their tools and policies. Data Governance and the tools associated with it will also evolve to define what data should and should not be used for answering the question: "We have this data but should we use it?".