BACK TO ALL POSTS

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Metadata Management

Snowflake

Data Governance

Data Discovery

Data Quality

John Joyce

Apr 23, 2024

Metadata Management

Snowflake

Data Governance

Data Discovery

Data Quality

It’s the home stretch for your company’s quarterly financial reporting, just hours before 7 AM—when earnings are due to be released.

That’s when an eagle-eyed analyst spots an anomaly in the revenue data: the `deferred_revenue` column is missing for the last few days of the quarter. This means your company’s earnings report probably understates future revenue obligations.

Your Monday morning ruined. Yet again, it’s all hands on deck for you and your team.

At Acryl, we don’t just empathize, we sympathize. We’ve lived this too.

That’s why we built Acryl Observe.

Introducing Acryl Observe


Observe is a complete data observability solution integrated with Acryl DataHub. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks.

But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

Perhaps the most important piece of all, Acryl Observe is built on Acryl DataHub, the leading data catalog. By unifying traditionally siloed tools and capabilities—Data Observability, Data Governance, and Data Discovery—the platform helps your organization reduce complexity, optimize costs, and increase the accessibility and adoption of data throughout your entire organization.

Be the First to Know


When I was a data product engineer at LinkedIn, it was normal for data users to discover breakages before I did. Somehow, some way, the data teams were always the last to know.

With Acryl Observe, you can begin to flip the script. Data teams can be the first to know, by configuring automated data quality checks for

  • Missing data
  • Duplicated data
  • Delayed data
  • Invalid data
  • Unexpected changes to table schemas


It offers four types of pre-built checks out-of-the-box: Freshness Assertions, Volume Assertions, Column Assertions, and Custom SQL Assertions to address a broad range of needs spanning structural and semantic integrity.

It also includes out-of-the-box anomaly detection—Smart Assertions—that help cover your blind spots, using AI models trained on your tables’ history.

When something does go wrong, your team will be notified immediately, with alerts that reach you where you work—Slack, email and more. You can configure alerts, ensuring they’re sent to the right people at the right time, giving your team a chance to get ahead of data quality problems before they become major incidents.

In other words, no more harshly worded emails. Less time coordinating expensive backfilling efforts across many tables and reports. Your team can respond to these incidents before they disrupt the business, and ultimately win your organization’s trust. Time and time again.

One Tool for Triaging and Responding to Incidents


Anytime I was debugging a data quality issue at LinkedIn, my journey usually started with the catalog—DataHub.

Most of what I needed was already there: detailed lineage metadata; documentation; dataset ownership; compliance information; and recent profiling statistics. This was all useful context for me to begin the painstaking process of fixing bad data.

Even so, DataHub was just one of several tools. For example, DataHub didn’t have automated alerting capabilities. Today, you can use its Assertions framework to track custom data quality checks, but you have to do the work of reporting the results. And there is no mechanism for getting notified when these fail. You have to build and maintain these mechanisms yourself.

It also didn’t have incident-management features. To resolve data outages, our team bounced between scattered Slack threads for communication, Azkaban logs (Azkaban was our version of Apache Airflow), and DataHub for assessing impact and finding the relevant stakeholders.

This lack of centralization often left stakeholders waiting in the dark when major data incidents arose, and left our team scrambling to pick up the pieces instead of preventing issues altogether.

It was clear that in an ideal world, the catalog would have been the source of truth – where incidents are raised & resolved collaboratively, Slack threads are tracked, accountability is established, and real-time table statistics and lineage is surfaced– a central hub for all your data activities.

We built Acryl Observe on this vision – offering an end-to-end solution for resolving data quality incidents fast.

With the combination of Acryl Cloud, Observe helps you stage an effective response to data quality breakages by providing:

  • Access to lineage, documentation, and ownership information at your fingertips, so you can quickly assess the impact of incidents and find the relevant stakeholders.
  • Access to table statistics, metadata, historical patterns and profiling to make debugging issues faster.
  • Incident-tracking & notifications to ensure relevant stakeholders are contacted as soon as problems occur—and kept in the loop as things change.


With Observe, you get a central command center for triaging incidents and coordinating an effective response before things get worse.

A Bird’s Eye View of Your Data’s Health


One final capability I wish I had during my time in data engineering: a visual overview of the state and health of the data stack. At LinkedIn, my team and I sometimes felt as if we were flying blind given the sheer scope of data assets we were responsible for; reactively responding to the hottest fires of the day and failing to track or improve our situation as a whole. This meant the small, but still meaningful problems often went untracked and unresolved indefinitely.

We built the Data Health Dashboard into Acryl Observe to make sure that other teams have a better experience than we did. This dashboard provides a real-time overview of the data quality and health of your entire ecosystem. This way you can see where the high priority fires are and begin to build a scalable, sustainable strategy for keeping things green across the board.

My favorite part of the dashboard is that data team members – and even individual stakeholders – can filter down the dashboard views to focus on their specific areas of interest. This makes it incredibly easy for anyone to monitor the health of their region in the ecosystem, and never lose track of lingering issues.

A Unified Solution


All together, Acryl Cloud gives you a central control plane for your data—unifying Data Discovery, Data Governance and Data Quality in a single platform.

By unifying these related, but traditionally siloed, capabilities, you can remove your data team as a central bottleneck, and empower your team to do more with less. Unifying these capabilities can reduce conflict in shared context like data ownership and documentation; and alleviate the operational overhead (and cost!) of maintaining several different tools. The end result is a more effective, scalable, & sustainable way to manage your organization’s mission-critical data.


Discover the Acryl Observe Difference


“We chose Acryl because we see the value of having both a data catalog and observability capabilities in one tool. Having data owners, maintainers, and consumers in one place streamlines incident management and allows for faster time to resolution.”

— Olivier, Data Engineering Manager at Depop


Want to learn more about why the best data teams choose Acryl Cloud?

👉 Click here to learn more.

👉 Click here to book a demo today.

Metadata Management

Snowflake

Data Governance

Data Discovery

Data Quality

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

Data Quality Should be Part of the Data Catalog - Introducing Acryl Observe

We didn’t go looking for an excuse to develop a data observability solution. There’s more than enough to keep us occupied building the world's best data catalog! ;) But the more experience we gained in working closely with Acryl customers, the clearer it became that data quality, data discovery, and data governance aren’t just complementary, but mutually reinforce one another. Acryl Observe provides data teams with everything they need to detect data breakages immediately, contain their downstream impact, keep stakeholders in the loop, and resolve issues fast—so that data teams can spend less time reacting and more time preventing.

John Joyce

2024-04-16

TermsPrivacySecurity
© 2024 Acryl Data