Publicado em July 10, 2020

Data Governance, Artificial Intelligence & Sustainability - by Janete Ribeiro

The practice of good data governance goes far beyond compliance with legal rules - By Janete Ribeiro

We hear all the time that "Data" has become the new "Petroleum", the most valuable asset that drives the economy of the 21st century. But what is not commented on is that the new “oil” as well as the original “oil” generates CO² and all the climate changes that oil generates.

You must be thinking, how can computers generate CO²? Come on, a study by Veritas, a company specializing in data protection, launched in May 2020 showed the following:

“... about 6.4 million tons of CO² will be released into the atmosphere, just in 2020, globally. This is the result of the energy spent on storing what is known as dark data - data that has not yet been treated or that simply has no value - and is polluting not only data centers, but, literally, nature. ”

When we talk about "dark data", "raw data", they come from IoTs (Internet Of Things), social networks, e-mails etc. Many companies collect data from several sources without a defined purpose, that is, without “governance”.

If for Artificial Intelligence, the more data we have the better the machine learning results will be, it is obvious that the quality of this data must be taken into account. There is no point in having Zetabytes of “raw” data, they will not be consumable by artificial intelligence algorithms.

When we talk about data governance, many executives see it as just a bureaucratic activity to comply with legal rules

such as data protection laws (LGPD and GDPR). However, the practice of good data governance goes far beyond compliance with legal rules. It minimizes financial, operational and mainly image risks, the latter being financially intangible, and reflected in the entire strategy of the company. In the health sector, for example, we are feeling in the skin during the current pandemic how the unreliability of different data sources can cause strategic errors and the death of thousands of people around the world.

Data governance is composed of a living cycle of Planning, People, Processes and Technology. We always need to plan, engage the people involved, reassess the processes and only then choose the right technology.

It is not complex or expensive to implement effective data governance, regardless of the size of the company. Here are some simple elements to be adopted that allow reducing costs, risks and increasing quality and strategic effectiveness of data-oriented companies:

1. Plan and define the purpose of the company's Data Governance program, which involves all areas of the company. Create a committee;

2. Establish processes for processing and qualifying data. Once you know the purpose of your data collection, make it effective enough to collect only what interests you;

3. Create centralized metadata, that is, a “Data Lake”;

4. Always evaluate new data manipulation tools, which make it easier for the end user to consume data;

5. Define accessibility criteria, information security controls;

6. Always evaluate the possibility of adding new sources through the use of APIs;

7. Create mechanisms to monitor the compliance of your rules with the new laws;

8. Train and publicize innovations with everyone in the company;


An example of a good application of technology in this area, is the use of artificial intelligence to manage all data loads, validate content that is being updated on the dataframe before it is stored and made available for use. By automating just this stage of the data lifecycle, we would already be able to save precious processing time and storage in the cloud, which are spent on hours of expensive data engineering professionals, cleaning and handling data that has been improperly collected .

There are several tools on the market that allow you to optimize data governance processes at different stages of the data life cycle.

When we talk about optimizing the data collection process, we are talking about tools that allow the management of information and data artifacts. So that we can obtain data consistency, such as elements, models and glossaries. These tools are usually capable of:

· Classify data based on usage or relevance;

· Manage relationships between data elements through hierarchies or taxonomies;

· Versioning features, having version-specific information is essential for effective monitoring and restoring updates from previous versions, when necessary;

· Generate historical reports, to demonstrate any changes and the sources of those changes, including metadata;

· Rollback features, that is (ROLLBACK), once in execution it must be able to return to the most recent state with the transaction name 'ROLLBACK' in case the execution process fails.

But before choosing a tool, the Governance committee must track its entire process of collecting, analyzing, publishing, storing, recovering and destroying data. Automating the wrong processes with the use of AI will only make the company's current problems regarding data management exponential.

Once you have control and purpose in relation to the data that an organization generates and collects, it will be able to extract the real value of the data and without causing damage to nature.

In order to have a complete training on data economics and structure your business strategy, I2AI offers the Analytics Foundation Certificate. Check the details and register now by clicking here.


The Hack website - https://thehack.com.br/dark-data-sera-responsavel-por-emitir-6-4-milhoes-de-toneladas-de-co2-na-atmosfera-em-2020/

Strategic Finance website - https://sfmagazine.com/post-entry/september-2019-ai-in-data-governance/

BIG-Data International Campus - https://www.campusbigdata.com/big-data-blog/item/129-la-parte-oscura-del-big-data-dark-data

524 reads 208 Likes

About the author

Janete Ribeiro

Janete Ribeiro

CEO da Analytics Data

Janete Ribeiro é CEO da Analytics Data, consultoria especializada em ciência de dados e inteligência artificial, possui certificação em governança de dados pelo MIT.
É embaixadora do WiDS (Women In Data Science), vinculado à Universidade de Stanford, professora de pós-graduação em Big Data & Analytics no SENAC,
autora de livros sobre Pesquisa de Marketing e Inteligência de Mercado.

Read too

World Cup 2022 – Day 12

World Cup 2022 – Day 12

12/04/2022 - Analysis of yesterday's games according to Artificial Intelligence

524 reads 208 Likes
4° Encontro de Networking Associados I2AI
Próximo Evento
Dec. 15, 2022

4° Encontro de Networking Associados I2AI

Reserve sua agenda para participar do 4° Encontro de Networking Associados I2AI em 15/12/2022 das 19h00 às 21h30. Neste dia promoveremos muito networking!!!