BACK TO ALL POSTS

5 Tips for Rolling out a Data Catalog

Onboarding

DataHub

Data Catalog

Metadata

Paul Logan

Mar 3, 2023

Onboarding

DataHub

Data Catalog

Metadata

Now that you have successfully deployed DataHub in your organization, it’s time to make the most of the platform by rolling it out to your stakeholders.

We know this can be a daunting task, so we reached out to members of the DataHub Community to hear how other folks have successfully introduced the tool within their companies.

SpaceX Launch

Photo by SpaceX on Unsplash

Read on to learn 5 concrete steps you can take to launch DataHub within your organization.

#1: Educate (& Deprecate) Around Data Catalogs

If DataHub is your first data catalog, take the time to educate your stakeholders about catalogs and their utility in the data stack. Help people understand the problems DataHub is meant to address and how you envision it fitting into their day-to-day workflows. The DataHub Blog is a great resource to help you get started, particularly:

3 Must-Haves for Metadata Management

If you have a previous data catalog, create a deprecation plan:

Create and forward-communicate hard deadlines for eventual deprecation with multiple stages where you’ll:

  • Stop adding new users
  • Disable UI
  • Disable API
  • Fully deprecate

Pair the deprecation with other onboarding techniques (champions, email campaigns, persona targeting). Make use of banners in your old tool & Slack announcements to increase awareness around the deprecation.

For example, the Data Discovery Team at one of our partners established both hard and soft deprecation deadlines:

  • Soft: If a user visited the old tool, it redirected them to DataHub. However, they were able to bypass the redirect if they were determined.
  • Hard: Their data team replaced the core functionality on the backend of their old catalog with DataHub, and tore down the previous infrastructure, disabling functionality from the old tool from that date on.

#2: Enlist Champions & Shift Left

Early in your rollout, take the time to identify and partner with highly-motivated stakeholders in your organization to serve as champions, and team up with them to address their common pain points via DataHub.

Sample Use-Cases for Different Champion Personas

Aren’t sure which stakeholders to engage? Look for members of your organization that have recurring workflows and/or responsibilities that could be improved by adopting DataHub.

For example, Data Engineers regularly change/update schemas that may have unintended consequences on downstream dependencies. By leveraging DataHub’s Impact Analysis feature, they can start to proactively communicate breaking changes to downstream data consumers.

Here’s a breakdown of common use cases and personas to consider when you’re looking for DataHub champions:

Common use cases to target when searching for DataHub champions

Common use cases to target when searching for DataHub champions

Once you have identified your DataHub Champions and the targeted use cases to solve with DataHub, schedule 1 on 1 time for tests and progress checks to ensure they are empowered to get the most from the tool. Draw from their journey for examples, “aha! moments”, and key learnings you can broadcast to your wider audience, and partner with them to onboard their immediate teammates & teams to replicate the workflows.

For example, here’s how Tim Bossenmaier, Data Engineer at inovex, described the two key personas they targeted with their rollout:

Whether they are business or data analysts or data scientists, we want to provide everyone who works with data with all the information they need in DataHub. For this reason, we pay special attention to the correctness of dataset schemas, the provision of schema descriptions, and correct lineage.
Analysts
Anyone who consumes data in a downstream way, mainly as report dashboards, etc. For them, it is very important that all KPIs are clearly defined in the glossary and linked to the appropriate entities in DataHub.
Data Stakeholders

Tim’s team shared learnings and demonstrated DataHub’s features to a select group of these users via weekly sprint reviews during their rollout.

Shift Left: Capture Metadata at its Source

Meet your DataHub Champions where they are. It’s highly likely that they are already capturing documentation, annotation, quality tests, and more in their existing tools and workflows.

Whenever possible, “Shift Left” by capturing this rich context at its source and sending it to DataHub. This will remove points of friction for adoption and will empower your Champions to focus on generating high-quality metadata in the tools and environments they already use. You can learn more about the power Shift Left here.

#3: Email Campaigns, Slack Channels, & Broadcasting Links

Establish a designated Slack/Teams channel for the rollout where you can post announcements and troubleshoot issues. Announce and link to the channel in the relevant company and team-wide channels.

Create a regular email campaign where you inform users of the state of the rollout and drive adoption with hooks that draw people in:

  • Link out to interesting & relevant discoveries in DataHub
  • Communicate timelines for the rollout & deprecations
  • Include materials from our blog and Youtube channel, or make your own to help users understand DataHub’s usefulness in the specific context of your org.
  • Speak to personas & value adds with featured quotes from your champions.

Be sure to link out to DataHub at every opportunity, on every surface you can find:

  • GitHub READMEs/PRs,
  • Slack/Teams,
  • emails,
  • app banners,
  • PagerDuty notifications,
  • birthday cards,
  • memes.
OK, so maybe not memes. But everything else!

OK, so maybe not memes. But everything else!

DataHub’s job in your organization is to provide helpful context and visibility; the wider you broadcast links, the better a job it will do, and the more people will understand what it’s for and what they can do with it.

#4: Regular Onboarding Workshops & Office Hours

Schedule regular onboarding workshops and announce them in email updates and common channels. Prioritize making users’ lives easier, and ensure they walk away with net new knowledge. You can lean on your champions to find a story that will stick with users.

One of our partners found that the governance conversation was especially compelling, and DataHub was the perfect solution to this common problem.

Schedule regular meetings and broadcast that you are available for troubleshooting. Target presentations and Demos for engineering/learning weeks and internal meetups to increase awareness of the rollout.

A sample week in our proposed onboarding program.

A sample week in our proposed onboarding program.

#5: Defining Success: Establish Goalposts, Owners, and KPIs

Before you start onboarding users, agree on what success looks like for your rollout by creating goalposts:

  • Establish success metrics and a future timeline of their expected state.
  • Assign owners that are accountable for these metrics as KPIs.
  • Create goalposts both for your stakeholders and for the team responsible for onboarding.
  • Always keep in mind: What is the key goal DataHub will hep you accomplish?

An example of onboarding team goalposts:

  • Champions identified & onboarded for each team by X date.
  • 20 WAUs by X date, 40 by Y date, and 100 by Z date.
  • 5 onboarding workshops held by X date with total 100 attendees.
  • 90% of old catalog’s traffic moved to DataHub by 2 weeks before the deprecation.

An example of stakeholder/end-user goalposts to set and encourage governance expectations:

  • 60% of assets have ownership by X date
  • Glossary terms added to all domains by X date.
  • All lineage populated for the Data Platform Team by end of the quarter.

In order for people to feel ownership of entities in DataHub, the introduction of dedicated roles can be helpful. Tim shared his expectations of a “data steward” in DataHub as an example:

“Since we don't want to manage all this data centrally, we have introduced the role of data stewards. We plan to have one data steward per area, who will then be responsible for keeping the KPI definitions in the glossary up to date and contacting teams when data appears to be corrupted or a KPI appears to be miscalculated.



Data stewards have special permissions that distinguish them from regular users. This is also the user group we are currently focusing on the most and for whom we are offering the introductory workshop sessions. We hope they will help us spread and establish DataHub throughout the organization.”

Have one team take point

A key point of success for one of our partners was its Data Discovery Team, which is the sole team responsible for discovering and ingesting the company’s data stack into DataHub. The team worked with technical stakeholders to identify new ingestion sources to bring into DataHub and owned developing any custom functionality that was required to onboard a new team.

Their Data Discovery team also started utilizing Great Expectations beyond the DataHub’s out-of-the-box integration. They leverage the built-in Great Expectations features in DataHub to profile the data, and additionally, they have rolled out Great Expectations as a stand-alone tool to drive data quality across the organization. Having both of these efforts under the same team will make it easier to converge in the future.

With one team running point, it’s easy to create KPIs around what portion of you’re org’s data stack is represented in your catalog. This creates incentives towards quick integration which don’t exist when the responsibility of ingestion falls to owners who don’t yet see the catalog as part of their daily workflow.

Bonus: Common Pitfalls

  • Don’t focus on too small a use-case: it’s a data party, and everyone’s invited.
  • Don’t start with too little data: ingest everything you can find that adds value.
  • Don’t just grab anything: review the sources you’re ingesting to prevent friction from poorly curated or blank datasets.
  • Don’t wait until the metadata is perfect: encourage users to fill in gaps and take ownership.

A big thank you to Tim Bossenmaier (inovex) and Juan Garcia Bazan for sharing their lessons learned!

Onboarding

DataHub

Data Catalog

Metadata

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data