There is no doubt that data governance is an essential practice for 21st century organizations. However, as companies look to implement new artificial intelligence solutions, including GenAI, its implementation is becoming increasingly complex and challenging. Find out how good data governance can help enterprise data projects thrive in the context of AI.

Data governance is a set of processes, policies, and standards that help organizations ensure that data is secure, accurate, and usable. It emerged in the early 2000s, but more recently, with the explosion of data in organizations and digital transformation, its adoption has become more necessary.

In fact, it represents a new approach to data management to extract meaningful data that supports strategic decision making. The adoption of data governance policies facilitates the transformation of information into high-value resources, while mitigating vulnerabilities in its implementation.

Some of the key aspects of data governance include

  1. A) Defining the owners of data assets.
  2. B) Developing a policy that specifies who is responsible for various aspects of the data.
  3. C) Defining processes for storing, archiving, securing, and protecting data.
  4. E) Develop a set of standards and procedures that define how the data will be used.
  5. F) Establish audit controls and procedures to ensure compliance with governance standards.

In Argentina, a common example where this practice can be seen is the National Registry of Persons, RENAPER, since many organizations consult data and there are not only very clear policies regarding the security, reliability and privacy of the data, but there are also specific regulations regarding what data can be consulted. This is the case of name and surname and gender identity, since special permissions are needed to access this data, and this is the nodal contribution of data governance.

Changing Scale and Internal Organization

As the scale of data in organizations grows, several key questions arise regarding the internal organization among the areas that manage or influence corporate information and the roles each plays:

  1. Who owns the data?
  2. Who is responsible if data is inaccurate or loaded incorrectly? This is the famous notion of accountability, which includes the ability to take responsibility and its importance to organizations.
  3. Which areas or profiles of the company are responsible for defining data standards and data protection?
  4. What are the risks of not having a clear structure of policies and responsibilities?
  5. Where and how should sensitive data be stored and handled? The emergence of more modern cloud environments also comes into play.

With the proliferation of systems, data, and integrations, it is clear that conflicts between departments are increasing. This is the classic problem between technology, legal, accounting, operations and more commercial or human resources areas, where the responsibility and definition of clear and transparent information is disputed.

For example, when the company has to prepare the annual financial statements, it is essential that there is a clear voice that defines what the company’s profit is and how this variable is objectively defined. It is not just a software issue, it is a management issue.

An emerging role in this fragmented process is the important figure of the Chief Data Officer (CDO), a senior executive responsible for managing a company’s data so that decisions can be made based on it to achieve business objectives.

This role was first created in 2002 at the financial services company Capital One. Initially, the role focused on risk management and compliance, but over time it has evolved: effective data management allows companies to adapt to change and anticipate what may come.

One of the central functions of the CDO is to set the company’s data usage strategy and centralize data governance, among other things.

The CDO then sits down with the key data-related areas of the business and provides a clear definition of internal and external information management policies.

A lot of times people use software as a service thinking that it is not their problem, that it is just a technology problem. And the fundamental change for organizations is to move from a traditional vision where certain people had a vertical monopoly on data or were loading the data in a way that was not effective or was unstructured, to having a policy that analyzes, classifies, and transforms raw data into statistics and models that can be shared, that can innovate, and that can contribute across the board to making decisions that are aligned with the core business of the organization.

When Data Governance Meets Artificial Intelligence

It is a fact that companies increasingly want to use Artificial Intelligence and GenAI solutions in their processes and decision-making with data.

However, with the clear need to define who is responsible for the data, there is an increasing need for clear policies, standards across the organization, and the ability to define what data is ethically correct for training a model.

Let’s take the example of the oil and gas industry. Suppose there are observations where a safety and hygiene analyst classifies those safety observations and labels whether the risk is high, medium or low. The analyst signs off and takes responsibility, so it is an important task.

Now, the company’s data science department is providing new artificial intelligence software to support these classifications. Who is responsible for this classification in this complex project? The health and safety department or the data science department? Even if the health and safety department says that the responsibility has now passed to data science, if management discovers that the data is wrong, the health and safety department would still be responsible for the data. Then the analyst will question the process because now that AI is involved, it is still his fault.

Of course, when this happens, the system needs to be enhanced with ways for the user to control this AI. The model has to have new control flows so that the health and safety analyst can say whether the data is correct or not.

And so new questions may arise, for example, if the model needs to be refined, which sector is the data from to do the fine tuning? If the model does not work well because it gives completely wrong answers, who is responsible for these hallucinations?

So there is a dilemma between the concept of AI as an assistant to humans or as a replacement for their job. From a data governance perspective, the one who governs the data is still a human, the owner of the data. And that person cannot delegate that responsibility, conceptually it is very strong: an AI today can only be seen as an assistant from a data governance perspective. Because the main owners are still humans.

Another important issue arises with cloud environments and data warehouses: today, data lakes are centralized repositories that store, process, and protect large amounts of data. They are ideal for big data processing, machine learning, and predictive analytics.

However, with the proliferation of cloud service providers, it is necessary to have a clear policy on not only who in the organization has access to the data, but also how to protect sensitive information. In fact, there are policies that allow private and personal data to be anonymized so that commercial companies do not have access to credit card data or payment information. Something that has not been resolved until recently.

If the company has a health and safety database with personal information of employees, it may not be possible or it may not be as easy.

On the other hand, GenAI raises the question of how to use corporate data or general information in large language models. Should we ask for legal permission to send corporate information to ChatGPT? This is a discussion in which the CDO has absolute influence, and it is essential to integrate the use of GenAI into data use strategies.

In general, traditional organizations with a recent digital culture will find it more difficult to resolve these conflicts, while those that are already moving toward a data-driven culture and have involved a CDO, not so much as a means of resolving urgent conflicts but as a strategist with a long-term vision, will have a better chance of resolving these issues.

At the same time, by using generative AI, organizations can achieve several benefits, such as improved data quality, improved data security, greater data integration (holistic vision), and empowered data users (more conscious and efficient use of data).

7Puentes: the master key to data governance and GenAI consulting

With more than 15 years of experience and more than 100 Artificial Intelligence, Machine Learning and Data Science projects in leading companies, we have a deep understanding of the new challenges that Data Governance poses for today’s complex organizations.

If your organization has questions or needs regarding the adoption of AI models and solutions that impact your data governance practices, contact our specialists for comprehensive advice.