BACK TO ALL POSTS

Just Shipped: UI-Based Ingestion, Data Domains & Containers, Tableau support, and MORE!

Release Notes

Metadata

Metadata Management

Open Source

Data Engineering

Maggie Hays

Feb 8, 2022

Release Notes

Metadata

Metadata Management

Open Source

Data Engineering

👋 Hello, DataHub Enthusiasts!

I hope this finds you happy & healthy ❤️

I’m so excited to share this update with you. Buckle up —it’s PACKED with new features & improvements for you to explore.

UI-Based Ingestion

If you joined the December Town Hall, you likely remember John Joyce’s awesome demo of UI-Based Ingestion. As of v0.8.25, you can now create, configure, schedule, & execute batch metadata ingestion via the DataHub user interface. UI-Based Ingestion makes getting metadata into DataHub easier than ever by minimizing the overhead required to operate custom integration pipelines. Read the guide here.

Choose an Ingestion Type and Configure connection details

Choose an Ingestion Type and Configure connection details

Easily set Ingestion Schedule and give it a name — you’re all set!

Easily set Ingestion Schedule and give it a name — you’re all set!

Data Domains — Fueling Data Mesh in DataHub

DataHub now supports grouping data entities into logical collections called Domains.

Domains are curated, top-level folders or categories enabling you to explicitly group related entities. This is useful for organizations that want to organize entities by department (i.e., Finance, Marketing, etc.), Data Products, or other logical groupings common in Data Mesh adoption. Read the guide here.

John Joyce gives an overview of Data Domains in DataHub

Take DataHub Domains for a spin!

Data Containers are LIVE!

Data Containers represent the physical grouping of entities — for example, a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.

You can associate Ownership, Glossary Terms, Tags, & Documentation at the Container-level to enrich others’ understanding of these resources.

Watch John’s walkthrough below!

John Joyce gives an overview of Data Containers in DataHub

Q1 Roadmap Progress

Data Quality — Metadata Model Support

Data Quality test results are now supported in the DataHub backend metadata model!

This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here).

Future releases will include a Great Expectations integration & UI support — we’re on track to complete this in Q1 as planned.

Want to learn more about what DataHub is working on in Q1? View the complete roadmap here!

Incubating Metadata Sources
Tableau — BETA

We are SO EXCITED to roll out the Beta release of our Tableau ingestion source as of v0.8.26. We are eager for Community Members to test out this integration & to provide feedback — join the conversation in Slack!

Elasticsearch

We added Elasticsearch as a supported metadata ingestion source in v0.8.23. It currently extracts metadata for indexes and column types associated with each index field.

Data Lake Files — BETA

We added a new Data Lake Files ingestion source in v0.8.24 to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON. Avro files are supported as of v0.8.25.

This is useful for organizations that wish to catalog files within AWS S3 without requiring Hive and/or Glue as data catalogs.

⚠️ We are aware of some performance issues with Data Profiling on this source and are working on improvements!

Have feedback to share about these new sources? Tell us all about it in our #ingestion Slack channel!

Ongoing Improvements as of v0.8.25

Support Multiple Instances of the Same Platform Type

This has been a widespread use case within the Community — you can now differentiate multiple instances of the same platform type!

If you already have pre-existing entries, use the `datahub` migrate command to migrate them over to platform instances; see the migration script here.

Ignore Specific Users in Data Profiling

We now support the ability to ignore specific users when calculating Top Users of a Dataset/Column — this is useful when you want to exclude users designated for maintenance/automated execution.

BigQuery — Data Profiling on Only the Latest Partition/Shard

Profile only the latest partition or shard in BigQuery to reduce processing time

Notable Metadata Model & Ingestion-Based Features
  • Avro files are now supported in the Data Lake File ingestion source
  • Support for nested Glue Schemas (as of v0.8.24)
  • Fix to add external Looker URL optionally to correctly route to Looker Chart or Dashboard when clicking “View in Looker.”
  • Fix to surface data profiling results for low-cardinality number columns. This fixes the reported issue when Min/Mean/Max/etc. values were not displayed in the UI after a successful Data Profiling run.
  • Community shoutout! @iasoon pushed a fix to match default username for Azure OIDC & Azure Ingestion source
  • Community shoutout! @thomasplarsson added a fix to support displaying Group Name as an Entity Owner

Community Contributions

We had 33 people contribute to the DataHub Project across v0.8.23, v0.8.24, v0.8.25, and v0.8.26!

Congrats to our first-time contributors!

@danilopeixoto, @dipeshmaurya, @eburairu, @icy, @Jiwei0, @ksrinath, @lhvubtqn, @maaaikoool, @senni0418, @ShubhamThakre

Shoutout to our repeat contributors!

@aditya-radhakrishnan, @anshbansal, @arunvasudevan, @dexter-mh-lee, @gabe-lyons, @hsheth2, @iasoon, @jeffmerrick, @jjoyce0510, @kevinhu, @maggiehays, @mayurinehate, @MikeSchlosser16, @MugdhaHardikar-GSLab, @nickwu241, @pedro93, @RickardCardell, @rslanka, @RyanHolstien, @swaroopjagadish, @thomasplarsson, @treff7es, @varunbharill

One Last Thing —

I caught up with DataHub Community Member, Ryan Holstien:

Maggie: What are you most excited about from v0.8.25?

Ryan: Containers and Domains! I’m a big fan of Domain-Driven Design (DDD) and this is a big step towards us being able to manage metadata for full Data Meshes.

M: Nice! Ok, not DataHub-related — is there a song you’ve been playing on repeat recently?

R: Don’t Call My Name — Skinshape

That’s it for this round; see ya on the Internet :)

Connect with DataHub

Join us on SlackSign up for our NewsletterFollow us on Twitter

Release Notes

Metadata

Metadata Management

Open Source

Data Engineering

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2024 Acryl Data