Notion is the connected workspace that allows teams to easily share documents, take notes, manage projects, and organize knowledge—all in one place. Users can create and customize beautiful documents, roadmaps, knowledge bases, and more, helping them work smarter and faster. Notion is trusted and loved by a global community of individual users and enterprises, including companies like Pixar, Headspace, Codeacademy, and Loom.
Their modern approach to a connected workspace has fueled user growth - quickly scaling from 1 million users in 2019 to 20 million users in 2021.
This rapid scaling has directly led to greater complexity in their data environments. For both technical and non-technical teams, discovering the right dataset and managing communication across data teams became increasingly difficult.
"We rely on Acryl to gain insights and ensure our critical data is reliable. Acryl’s managed product takes DataHub to the next level through automation and emphasis on time-to-value."
Ada Draginda
STAFF DATA ENGINEER NOTION LABS, INC.
Notion runs a lean data engineering team and a fairly streamlined data stack. They put an emphasis on Snowflake for production data sets, dbt for managing data transformations, and Tableau for data visualization.
In the initial days of Notion’s growth, the team focused heavily on scaling their data science function. While that practice helped generate data-driven insights, it quickly became cluttered, as data scientists did not follow best practices for data hygiene.
As Notion continued to scale their team, they built functions to manage their data platform. They established a data engineering team to introduce best practices and formalize and organize their insight engine.
Data had grown at Notion to more than 2000 tables across locations. Their data footprint was fairly ‘wide’ with many different data sources, including sprawling sources like Fivetran, Census, and Segment. While some of these tables were highly valuable, some were ad hoc tables with little benefit to the business.
For business users, asking simple questions of data becomes a challenge. What data is available? Where is it? How do I use it? What does it mean? Why is this here? How is it related to other data sets?
For Staff Data Engineer Ada Draginda, it was clear that a data cataloging solution was needed to tag, sort, and organize data to ensure business users were only interacting with the correct, relevant information.
Without an organizing layer, business teams were not able to determine signals from noise or even get a sense of what data was available for their analysis.
According to Ada, “Users would ping a message in Slack, and hopefully someone would respond to them. Sometimes things would fall through the cracks - we didn't have a good solution.”
“What that meant was people would go into our data and make assumptions, and try to infer what would be the most appropriate dataset to use. Sometimes they’d use the wrong thing, and it wasn’t always possible to catch these errors soon enough.”
Ada and the Notion data engineering team came initially to Acryl for data discovery, governance, and lineage. After an initial proof of concept, the project widened to include their data platform function, their data science team, and ultimately everyone in their company.
Shortly after the implementation, Acryl became the source of truth for documentation - ie. table descriptions and column descriptions. Rather than having the same documentation in multiple places like Snowflake and dbt, the Notion team was able to centralize documentation on Acryl and better manage updates and changes.
Implementing a solution even shortened the onboarding cycles for new data scientists, providing new hires with an easy orientation to datasets and their locations, interactions, and dependencies.
The end result? According to Ada, “We've really reduced that noise, and get better signals by using a ‘one stop shop’ that has everything. All the fundamental information we need - and who to talk to if something goes wrong.”
Notion uses Acryl to manage many critical data processes including: tagging and managing GDPR compliance with a business glossary, adding appropriate tags and documentation while checking tables, and determining which tables have been reviewed and which have not. Specifically for columns with PII, Acryl is used to enable GDPR-compliance processes like masking, dropping fields, or identifying all relevant data that needs to be deleted in a prompt fashion.
Notion also uses Acryl to prevent breaking changes via data lineage and impact analysis. With so many data sets and a lean team, it was hard to know the impact of any potential changes to a table. Prior to Acryl, there was no formalized process in place to mitigate breaking changes in data.
With Acryl, the Notion team can easily see downstream dependencies, if any exist - and backfill tables as needed.
“Data lineage has been available previously in a few other places, but it’s challenging to either read or extract out. Acryl is such a wonderful tool for lineage, especially since it can track through multiple hops - a few steps up or a few steps below. It’s the easiest place for us to see lineage.”
We've really reduced noise and get better signals by using a ‘one stop shop’ that has everything in Acryl. All the fundamental information we need - and who to talk to if something goes wrong.”
Ada Draginda
STAFF DATA ENGINEER NOTION LABS, INC.
While the data team still comprises most of the users, Acryl is built to support a broader user base across the company. Business users outside of the data team use Acryl to drive focused, ad hoc questions (ie. “what is the success rate of our marketing campaigns?”) on a less frequent basis.
An intuitive interface helps Acryl drive this kind of wider adoption. As Ada describes, “Acryl feels fairly intuitive - I learned it all without needing any documentation. It’s a good product that’s fairly self-evident, with additional tools for self-direction.”
This user-centric approach extends to building on top of DataHub, Acryl’s open source foundation. Ada and the Notion team have built services on top of DataHub and contributed to the open source community.
According to Theodore Wou, Data Engineer at Notion, “The DataHub API is really good. As a simple example, as we were getting started, we created a Notion page for data people to document ownership, since that’s what we’re used to internally. We were able to sync that with DataHub for a pretty painless onboarding.”
As Ada detailed, “Part of the reason I chose Acryl was because I knew we could make it as extensible as needed, especially around transformations. The documentation is thorough and up-to-date, which is a plus.”
Data catalog implementations have high rates of abandonment, and Acryl also provides managed onboarding and support to safeguard against failure. As Ada described her experiences with the Acryl staff and community: “Everyone's super friendly - nice and helpful, and always responsive. I’ve only had positive experiences.”