BACK TO ALL POSTS

Metadata Day Round-Up: Insights and Ideas for Implementing Data Governance

Data Governance

Data Engineering

Metadata

Open Source

Shirshanka Das

Sep 28, 2022

Data Governance

Data Engineering

Metadata

Open Source

A good plan implemented today is better than a perfect one implemented tomorrow.

If there’s one line I’d pick to summarize the Metadata Day Expert Panel’s thoughts on data governance in action, it would be this one.

But of course, there’s more — and that’s what we’re talking about in this post of our Metadata Round-Up series.

(Missed the last post? Check it out here: 4 Principles That Are Driving Modern Data Governance)

Data Governance in Action: Top 3 Takeaways from the Metadata Day 2022 Expert Panel

1. Think holistically, but incrementally

The challenge: Data governance is a wide, and end-to-end problem because it involves looking at data warehouses, service APIs, third-party data and integrations, and so much more.

I experienced this firsthand while tackling the data inventory problem as part of LinkedIn’s GDPR efforts. As the tech lead of the data platform team and somewhat of a data veteran at LinkedIn, I thought I knew where all the data lived and how it was being moved across the stack. But when I started interviewing stakeholders, I found multiple data repositories and data movement systems critical to their teams’ daily workflows — that I had no idea about.

To tackle the complexity of data governance, a horizontal approach — one that sets up a horizontal layer to build on — is an elegant and comprehensive solution, but there are practical issues to consider. Building such a layer is a multi-year effort in itself — but most business and compliance use cases need information and outcomes today, not in the future.

Amol Deshpande, Professor, University of Maryland, recommends a practical and incremental approach that focuses on solving the immediate problem and unlocking business value first.

Amol shares why a horizontal approach to data governance is not always a practical one

The right approach, according to Amol, is the one that combines big-picture understanding with incremental changes. This means that organizations need to take a holistic view while taking an incremental approach — so there’s minimal repetition and replication.

Source: Unsplash

Source: Unsplash

Think about it: Data governance requirements have a lot in common with privacy and compliance requirements; it makes little business sense for organizations to do this kind of work twice if they can avoid it.

Thinking back to my experience with dealing with the large amount of “previously un-mapped” data assets and data flows during a data inventory exercise back at LinkedIn, we had come to a similar cross-road and were forced to ask:

Do we inventory these additional assets in a separate spreadsheet and have the relevant teams manage it themselves through a manual audit process? (something pretty quick to do).

Or do we put in the effort to integrate it via automation into the central DataHub catalog (with the downside that this might take upwards of a couple of months to build a complete solution)?

One of the strategies we used successfully was putting in the automation effort for the core data flows expected to be there for the longer term. DataHub’s push-based mechanism allowed for this system’s new datasets to be moved or ingested instantly, keeping the inventory fresh and updated.

For the remaining “legacy” data assets, a one-time export was done and a script was written, so that while the management + collection was still manual, we had push-button automation anytime we wanted to synchronize with the central catalog.

This pragmatic tradeoff allowed us to ship a functional solution in less than a month without compromising on the long-term architecture.

2. Use a solid foundational framework
Source: Unsplash

Source: Unsplash

The challenge: Effective data governance cannot be accomplished using code alone. It needs a well-designed combination of policy, incentives, enforcement, and human engineering.

Joe Hellerstein, Jim Gray Professor, UC Berkeley, recommends the below 3-part foundational framework to build a data governance model around.

  1. Start with policy (preferably as specifications)
  2. Build observability into the system (for transparency)
  3. Work on enforcement or incentives (or both) for implementation of the policy

Joe’s 3-part framework for a data governance model

Joe, of course, is really skilled at distilling large amounts of complexity into a pithy three-part framework that actually works.

I used a similar framework to drive the creation of a successful compliance monitoring system (codenamed CMON) during the GDPR implementation at LinkedIn.

More recently, we again applied this framework of “define good”, “monitor adherence to good” and “enforce or reward good behavior” to build the governance tests capability into the Acryl DataHub product.

3. Get creative with tools and methods to drive adoption

The challenge: In companies with an engineering-first culture, implementing a top-down approach, or introducing new tools purely for data governance, is a tough task.

This means that organizations need to get creative about using and repurposing existing tools to implement data governance. The key is to creatively combine tooling, process, and governance for an approach that works both economically and programmatically.

Here’s an example that Nishant Bhajaria, Director of Privacy, Engineering, Architecture, and Analytics, Uber, shared from his time at Netflix. Given Netflix’s Freedom and Responsibility culture that prioritizes staff autonomy, it can be a challenge to implement a top-down approach (say, making engineers go to the central IT team for permissions) to govern data access for engineers.

The Privacy team worked around this by using an existing tool built in-house by the Security team (to prevent third-party access to data) and repurposed it for their use case — accompanied by a simple process change.

Here’s how they did this: Engineers continued to be given the keys and encryption access for data access. However, the keys were changed if data wasn’t being accessed at least once every seven days. This meant that engineers had to reach out for further access — but only if they weren’t accessing data often enough.

This helped show that engineers didn’t, in fact, need data access — instead of preemptively revoking access from them.

Nishant shares an example of a creative repurposing of tooling that combines tooling, process, and governance

Juan Sequeda, Principal Scientist, data.world, shared another example of tying employees’ bonuses to data quality. Such a model with executive-level buy-in and implementation of incentives is key to making data governance work on the ground.

Source: Unsplash

Source: Unsplash

It is great to hear anecdotes from the field about how people are getting creative with re-purposing tooling for governance or tying in incentive structures from the very top to create a change in culture.

It brings back another memory from my time at LinkedIn. Our engineering team was known for its strong emphasis on craftsmanship. This showed up in the annual performance evaluation of engineers along three dimensions: leadership, craftsmanship, and execution.

I vividly remember the happiness I felt when a senior review committee member asked to see an employee’s DataHub profile page — to assess if they had built reliable, highly-used, and trustworthy data assets.

This living, breathing report card of the data work that you do as a professional — where the quality of your work is observable and rewardable — leads to a virtuous cycle of well-maintained data assets and rewarded conscientious data professionals — critical for a self-sustaining high-quality data practice.


Watch This Space

I’m sure there are many such stories of innovation, ideas, and insights from the field. We at DataHub would love to hear from you!

There’s a lot more to unpack from Metadata Day 2022. Stay tuned for the rest of the posts in this series.

Next week, we’ll focus on building cohesive collaboration between technical and business users for effective — and efficient — data governance.

Data Governance

Data Engineering

Metadata

Open Source

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data