BACK TO ALL POSTS

Metadata Day Round-Up: 4 Principles That Are Driving Modern Data Governance

Metadata

Data Engineering

Data Governance

Open Source

Community

Shirshanka Das

Sep 13, 2022

Metadata

Data Engineering

Data Governance

Open Source

Community

In May this year, we at Acryl Data teamed up with LinkedIn once again to host the third edition of Metadata Day. A massive shoutout to our participants, expert panelists, and the 4000+ DataHub community for making this happen!

A quick flashback before I go ahead. In 2020, when we launched Metadata Day, we went all in to focus on the Importance of Metadata. 2021 was all about metadata and the data mesh.

In 2022, we zoomed in — and out — to talk about Metadata: Governance as Code, with a distinguished expert panel from the world of academia, research, and the industry.

Experts panel metadata day 2022

I’ll be sharing my key takeaways from this insight and idea-packed session in a series of posts, starting with this one.

But before we dive in, I want to take a minute to set some context and explore the theme of the event.

Why is Data Governance as Code important?

As the workshop kicked off, we realized that there is maybe a more fundamental question that needs to be answered first:

Why are we still talking about data governance?

Data Governance as a term has been around for pretty much as long as Data itself, but we as an industry still seem to grapple with defining it well, much less solving for it. It’s not for lack of trying, there are a couple of reasons that make it an inherently complicated problem.

  • One, it is broad: it includes data quality, privacy, access, and lifecycle management.
  • Two, it’s complicated: there are multiple personas, processes, and business use cases to cater to.
The flow of data into, within, and out of today’s organizations is a tsunami breaking through rigid data governance methods. Yet our programs still rely on that command and control approach.
Achieving deep insights from data can’t happen without good governance practices. All indicators point to the need to create a resilient and responsive data governance function.

This snippet from Laura Madsen’s Disrupting Data Governance: A Call to Action, perfectly describes why data governance continues to be a talking point within the data community, and businesses at large.

Is Data Governance as Code the solution?

The last few years have seen an increased interest in applying code-first principles to data governance and its implementation. While this is exciting, there’s a lot that needs to be done to make this a reality.

Metadata Day presented the perfect opportunity to do just that — by diving deeper into data governance to

  • distill the high-level ideas of governance-as-code into ideas that can work in small and large companies
  • discuss their practical implementation of these ideas, and

… most importantly, build and nurture a community of practitioners committed to data governance.

Now that we’re on the same page about the theme of this year’s Metadata Day, it’s time to dive into the key takeaways from the panel session.

While there’s a lot that I took away from my discussion with the expert panel, in this post, I want to focus on four fundamental approaches — in some cases ‘shifts’ — that emerged as the foundation of a robust approach to data governance.

The 4 Foundational Principles of Effective Data Governance

1. Data governance ( ̶s̶h̶o̶u̶l̶d̶ ̶b̶e̶) IS a business priority
Source: Unsplash

Source: Unsplash

Teresa, who heads the incubation and scaling of breakthrough cloud technologies at Accenture, while talking about how critical data is becoming to business, shared how

  • Even companies that aren’t traditionally data-focused are looking at AI and data more than ever.
  • 50% of earnings calls of Fortune 500 companies have CEOs reporting on AI and data

Companies need governance to make decisions on trusted business data, and the most effective approach to governance is one that focuses on business users and their needs.

In my conversations with hundreds of data engineers, data architects, and data leaders in the DataHub community, there’s something I’ve noticed time and again.

Data teams that are highly successful within their company are those that are building and adopting technologies not just to improve their own lives, but collaborating closely with Sales, Marketing, and other business stakeholders — to build solutions that integrate governance into their critical business workflows.
2. A mindset shift: Towards usage and usability, not just protection

For too long, data governance has almost exclusively focused on protecting data. This needs to change. We need to look at data governance as a way to

  • ensure usability, and
  • drive usage

Or as, our panelist Juan, Principal Scientist at data.world, said (quoting Mark Kitson) — data governance is like a car brake — its role is not to slow you down, but to enable you to drive fast, safely.

This discussion reminded me of my time leading GDPR efforts at LinkedIn, and a Strata 2017 talk called “Taming the ever-evolving Compliance Beast” that emerged from it. My colleague and head of Global Privacy at LinkedIn at the time, Kalinda Raina had this great quote that I used in my talk to illustrate the paradox that exists between data democracy and data privacy.

The Linkedin Privacy Paradox

But as I and many other practitioners have experienced firsthand, you can solve for both if you build upon the right foundation of metadata.

This starts with first understanding the needs of all the stakeholders involved — not just the data producers, but also the business users and data consumers.
3. Data governance should be about making decision-making scalable and easier

We’ve underscored the importance of enabling usability for a successful data governance practice, but prioritizing usability depends on understanding what usability looks like in action.

Ultimately, good data governance, as Nishant, Director of Privacy Engineering, Architecture, and Analytics at Uber, says, is about building discipline around data — how and what to collect, where to keep it, and whom to use it for — so we can make better decisions at scale. What does this look like?
Bringing in automation wherever appropriate to make way for deterministic decision-making — eliminating the need for engineers to make decisions on a case-by-case basis.

An anecdote that Nishant shared, which paralleled efforts with what we had done at LinkedIn, was setting defaults that removed access to datasets for engineers if they had not accessed them for some time — while retaining access to datasets that they seemed to be frequently using.

We paired this with an easy automated way to re-request access to data for a limited window of time with business justification. This is a simple but very effective way to use operational metadata (audit logs) to trim down the surface area of access automatically — without needing a “stop the world” initiative or long meetings with stakeholders to align on a “data access management” program.

Nishant on the practical way to look at data governance and its objectives [10:30–11:32]

This requires baking in governance and identity signals (via metadata) into the data itself to give data the identity it needs — so it can be used effectively and correctly — without adding to the risk of the business.
4. A practical shift: Towards federated computational governance
Source: Unsplash

Source: Unsplash

It’s no secret that most companies, in their pursuit of process-light approaches, have so far rewarded distribution and loose coupling.

You can see this in the decentralized software engineering stack built upon versioned source-controlled artifacts, in APIs and tests that exist as contracts between creators of libraries/services and their consumers, CI/CD pipelines, and operational monitoring that ensure that artifacts continue to meet their commitments as they evolve; characteristics we take for granted as part of the modern engineering culture.

In contrast, data governance has stayed centralized — partly due to it inherently being a centralized concern, and partly because of the lack of data tooling to bring modern engineering practices to data governance.

It is clear that data governance approaches need to shift — simply because what’s worked so far, may not work in the future.

We need to move towards a federated approach to data governance — one that applies software engineering discipline towards the effective enforcement of centralized concerns (policies).

Watch This Space for More

I’ll be sharing more of our learnings and takeaways from my discussion with the expert panel.

Stay tuned for the rest of the posts in this series.

In the meantime, check out the earlier editions of Metadata Day

Metadata

Data Engineering

Data Governance

Open Source

Community

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data