BACK TO ALL POSTS

Governing the Kafka Firehose

Metadata Management

Data Contract

Metadata Management

Data Contract

In an oft-memed scene from the cult film UHF, the goofy host of a TV variety show shouts at his guest:

“You get to drink from the firehose!”

When it comes to managing and governing Kafka data, data teams can easily identify with that guest.

It’s as if they’re tasked, hopelessly, with sipping from a firehose of streaming data.

And it isn’t just the sheer volume of event data, nor the incredible velocity at which it gets vectored into Kafka. Nope, the biggest issues stem from frequent and unexpected schema changes.

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage.

And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

Breaking Free from Breaking Bad

To tackle this and other challenges, Saxo Bank, fintech innovator Chime, and dozens of other companies rely on Acryl Cloud, the SaaS data catalog and metadata platform based on DataHub.

With Acryl Cloud, data teams not only get alerted to Kafka data breakages as soon as they occur, but also enjoy one-click access to all of the tools they need to triage and resolve them—in a single UI.

Plus, by integrating checks for ownership and data contracts into their deployment workflows, data teams can practice shift-left governance. This improves the quality and availability of production Kafka data, along with that of the critical reports, charts, and dashboards depending on this data. Best of all, shift-left governance cuts down on post-deployment rework, making teams even more productive.

The Challenge of Governing Kafka Data

Data teams face three primary operational problems with their Kafka data:

1. Unpredictable schema changes.
2. Rapidly proliferating Kafka topics.
3. Redundancy and duplication within topics, usually as a result of partitioning and replication.

But by focusing just on the operational dimension, we’re missing the forest for the trees.

Because above all, teams need a way to manage and govern how their Kafka data is used.

They need to understand what this data is, who owns it, where it came from, what’s been done to it, and what it’s used for. Is it important? Why? Who or what depends on it? Which downstream deliverables or assets? Associated with which processes, services, or products? They need to know if certain Kafka topics contain sensitive data, and, if so, sensitive how? (Can it be used? Under what circumstances? How long can it be retained? What are the procedures for destroying it?)

And they need to be able to monitor, maintain, and improve the quality of their Kafka data. Say a schema change breaks one or more data pipelines, starving downstream KPIs, metrics, and measures of time-sensitive data. The problem is that these analytics themselves don’t always break—they might just be “off.” Ideally, they’re “off” enough that at least one downstream consumer notices; sometimes, however, no one notices, and incorrect data is used to inform decision-making.

Teams need to get notified when breakages occur, and they need to ensure downstream consumers are informed, too.

The Acryl Cloud Difference

This is why customers like Saxo Bank and Chime rely on Acryl Cloud to keep their Kafka-powered analytics, along with other critical business processes, services, and products, reliable and available.

First, Acryl Cloud is based on DataHub, which—like Kafka—was first developed at LinkedIn. In fact, DataHub was designed from the ground up to integrate with Kafka. This tight integration enables you to detect Kafka schema changes in near-real-time. On top of this, Acryl Cloud’s built in alerting and notification capabilities—which connect to Slack and Teams, as well as PagerDuty, OpsGenie, and similar incident-management tools—ensure data teams get alerted almost as soon as problems occur. These same features automatically notify stakeholders, and keep them updated in real time.

Second, Acryl Cloud bundles all of the tools and features data teams need to triage and work around Kafka-related outages or data quality issues. It enables one-click access to rich metadata and documentation—including detailed lineage metadata, ownership information, and compliance labels and documents. Its automated impact analysis tools equip teams to understand and work around breakages. And its discovery and data profiling tools help teams quickly get to the root of breakages.

Third, its integrated data observability solution, Acryl Observe, enables teams to understand how, if at all, breakages affect critical warehouse data—and the KPIs, metrics, and measures that depend on it.

The Takeaway

Ready to govern the Kafka firehose? Interested in learning more about Acryl Cloud and Kafka?

Download and check out this free information sheet, then speak with an Acryl Cloud expert about how to get a customized demo for you and your team!

Acryl Cloud transforms Kafka data management and governance. Its advanced features allow you to:

  • Trace Lineage and Provenance.
  • Monitor Schema Changes.
  • Improve Data Quality and Freshness.
  • Shift-left for governance.
  • Ensure Proper Data Handling.
  • Govern Holistically.
  • Foster Trust in Data, Improve Reliability.

Metadata Management

Data Contract

NEXT UP

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

Data Quality Should be Part of the Data Catalog - Introducing Acryl Observe

We didn’t go looking for an excuse to develop a data observability solution. There’s more than enough to keep us occupied building the world's best data catalog! ;) But the more experience we gained in working closely with Acryl customers, the clearer it became that data quality, data discovery, and data governance aren’t just complementary, but mutually reinforce one another. Acryl Observe provides data teams with everything they need to detect data breakages immediately, contain their downstream impact, keep stakeholders in the loop, and resolve issues fast—so that data teams can spend less time reacting and more time preventing.

John Joyce

2024-04-16

Get started with Acryl today.
Acryl Data delivers an easy to consume DataHub platform for the enterprise
See it in action
Acryl Data Logo
Acryl DataHub
Acryl ObserveCustomer Stories
TermsPrivacySecurity
© 2024 Acryl Data