BACK TO ALL POSTS

Dependency Impact Analysis, Data Validation Outcomes, and MORE!

Project Updates

Open Source

Metadata

Data Engineering

Community

Maggie Hays

Mar 8, 2022

Project Updates

Open Source

Metadata

Data Engineering

Community

👋 Hello, DataHub Enthusiasts!

I hope this finds you safe & healthy ❤️

February was a very busy month for the DataHub Community with stellar progress against our Q1 Roadmap. Let’s dig in!

NEW! Dependency Impact Analysis

Raise your hand if you’ve ever deployed a schema migration or breaking change, only to find out it broke downstream pipelines, reports, ML models, etc., that you had no idea even existed.

🙋‍♀️ ️I’ve done this more times than I care to admit 😅😅

Gone are the days of pushing a breaking change and hoping for the best! Beginning with v0.8.28, DataHub users can quickly view a given entity’s complete set of downstream dependencies, making it easier than ever to proactively identify the impact of schema migrations, data deprecation, and more.

Demo pipeline

Using the Impact Analysis Lineage view lets you see the complete set of downstream entities that a change to a given entity may impact. You can also search, filter, and export the list of entities to CSV to slice & dice to your heart’s content.

Test out Lineage Impact Analysis here!

NEW! Display Data Validation Outcomes in the UI

Starting with v0.8.28, DataHub now supports surfacing outcomes from Great Expectations validations in Dataset Entities! End-users can quickly view the complete history of validation outcomes to understand the trustworthiness of your data. Stay tuned for future work to provide native support for other data testing suites; in the meantime, watch John Joyce’s demo below!

Ongoing Improvements to the DataHub UI

UI Refresh of Users, Groups, Policies, and Tags

The User Detail Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see which Groups the User belongs to.

DataHub Dog

See it in action here.

We also overhauled the User Group Detail Page, allowing you to assign an email address, Slack Channel, Group Owner, and manage group members via the UI. View, filter, and search across all data assets owned by the User Group.

DataHub Core

Test it out here.

The Tag Details Page also has a new look! You can now edit the definition, assigned owners, and tag color via the UI.

Tag Details Page

Try it here.

We refreshed the Policies Page, allowing you to see DataHub policy details, associated DataHub Users/Groups, and policy status at a glance.

Policies Page

Test it out here.

Notable Metadata Model & Ingestion-Based Features

Track changes to entities using the Timeline API

We’re excited to roll out the new Timeline API, providing a unified timeline of changes to entities in the metadata graph to give a complete picture of how your metadata has evolved. We currently track changes to:

  • Technical Schema (ie. new/removed fields)
  • Ownership
  • Documentation
  • Tags
  • Glossary Terms

Coming SOON! We will be building out UI support to visualize the full timeline of changes; more to come!

Read the docs here.

First Milestone: Fine-Grained Lineage available in the Metadata Model

The Metadata Model now supports Fine-Grained lineage (aka Column-Level lineage) for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a data job.

Define Dataset-to-Dataset lineage via YAML

As demonstrated in the February 2022 Town Hall, you can set Dataset-level lineage via YAML. This is great for teams with more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources. Massive shoutout to Community Member Edward Vaisman for contributing this back to the project! You can watch his demo here:

Miscellaneous Metadata Ingestion Updates:

  • Incubating: PowerBI Ingestion source, ClickHouse Ingestion source
  • BigQuery Profiling: configurations to support profiling the latest partition/shard; disable profiling by partition overall
  • Tableau improvements: Workbooks are now modeled as “Containers”
  • Kafka Stateful Ingestion — shoutout to @claudio-benfatto for building this out!

Notable Docs Updates

NEW! Tips for Searching within DataHub

Have you ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl.

Improvements to Metadata Model Docs

This is a huge win for the Community — we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model — take a look here.

Community Contributions

We had 47 people contribute to the DataHub Project across v0.8.27 & v0.8.28!

Congrats to our first-time contributors!

@Ankit-Keshari-Vituity @bskim45 @buggythepirate @cuong-pham @daha @eddyv @gmcoringa @guidoturtu @Huyueeer @jieqiu0630 @mmmeeedddsss @mohdsiddique @ne1r0n @ngamanda @pppsunil @satyamkrishna @stephenp-gr @tc350981 @vcs9 @wangqinghuan @zhaofengnian18

Shoutout to our repeat contributors!

@abiwill @aditya-radhakrishnan @anshbansal @arunvasudevan @claudio-benfatto @dexter-mh-lee @eburairu @gabe-lyons @hsheth2 @jeffmerrick @jjoyce0510 @kevinhu @ksrinath @maaaikoool @maggiehays @mayurinehate @MugdhaHardikar-GSLab @rslanka @RyanHolstien @sgomezvillamor @ShubhamThakre @swaroopjagadish @treff7es @vlavorini @xiphl @zhoxie-cisco

One Last Thing —

I caught up with DataHub Community Member, John Joyce:

Maggie: Feb was a super busy month for DataHub — what are you most excited for the Community to start using?
John: I am PUMPED about DataHub’s expansion into Data Quality with the Great Expectations integration. I think automated, recurring quality checks will become an incredibly useful trust signal for both consumers & producers of data.

M: SAME. This is going to be huge for the DataHub Community. Unrelated — is there a song you’ve been playing on repeat recently?
J: Mercy Falls — CharlestheFirst

That’s it for this round; see ya on the Internet :)

Connect with DataHub

Join us on SlackSign up for our NewsletterFollow us on Twitter

Project Updates

Open Source

Metadata

Data Engineering

Community

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2024 Acryl Data