How to drive trusted decisions without changing your current data infrastructure.
Learn more about DataOS® in our white paper.
Anyone who works with data knows that it’s long past time for data catalogs to catch up with the rest of the modern data stack. Data is no longer consumed primarily by the IT team. Today, data teams include data analysts, data scientists, product managers, business analysts, citizen data scientists, and more. Each of these people has their own favorite data tools and even different languages for describing data.
Too often collaboration dissolves into chaos and confusion. Frustrated questions like, “What does this column name mean?” and “Why are the numbers on the dashboard wrong again?” slow data teams to a crawl. To help ensure this doesn’t happen, data teams and other users can leverage metadata for answers instead.
Metadata is a solution to enable collaboration across business units and to make data easier to find and use. Metadata (documentation, queries, history, glossaries, etc.) makes data understandable.
Anyone who has used a library catalog is familiar with metadata; tags like author, date of publication, subject, Dewey Decimal Number and more help you locate a book and determine whether it’s useful for what you have in mind. However, the modern idea of metadata dates to the 1990s and the rise of the Internet.
As the Internet grew, data and metadata exploded. IT teams were given ownership of data in most companies and placed in charge of creating an “inventory of data,” the way a grocery store might inventory apples and soap. Setting up these inventories and keeping them current were constant struggles for IT.
Data catalogs arose during the Hadoop era (2010s). They evolved as companies realized that they needed to improve the data inventories of the 1990s-2000s by adding new business metadata. The idea was to help the expanding class of business users understand their datasets and put the data in a business context.
These early data catalogs were clumsy, and specialized solutions were lacking. The earliest adopters of the modern data stack and most large tech companies resorted to building their own proprietary solutions, such as Airbnb’s Dataportal, Facebook’s Nemo, LinkedIn’s DataHub, Lyft’s Amundsen, Netflix’s Metacat, and Uber’s Databook. Small companies, without the resources for such in-house projects, had to wait for solutions to arrive.
And arrive they did, eventually, with tools such as Apache Atlas. Still, while the rest of the data stack has evolved in the past few years, and tools like Fivetran and Snowflake let users set up a data warehouse in hours once they are installed, data catalogs have not kept up. Even trying out current metadata tools involves significant engineering time for setup, plus weeks of back and forth with a sales rep to get a demo.
It’s time for a metadata solution that is just as fast, flexible, and scalable as the rest of the modern data stack. In January, 2021, Prukalpa Sankar wrote on towardsdatascience.com, “[I]n the next few years there will be the rise of a modern metadata management product that takes its rightful place in the modern data stack.” These new data catalogs will be based on principles of data and data use that have developed alongside the data stack.
Today’s BI dashboards, code snippets, SQL queries, models, recordings, presentations, and Jupyter notebooks are all data assets. All can be searched and analyzed for valuable information. All can be enriched and made more usable through appropriate metadata.
A modern data catalog should leverage metadata as a form of data that can be searched, analyzed, and maintained in the same way as all other types of data. The ability to process and understand metadata will help teams understand and trust their data better.
For example, query logs are just one kind of metadata available today. SQL query logs, properly analyzed, allow us to create column-level lineage, assign a popularity score to every data asset, and even deduce the potential owners and experts for each asset. Quality ratings from users, indexed by a data source, can identify source problems that can be addressed to improve data quality throughout the organization.
Today’s data catalogs have greatly improved discoverability, but they still do not give organizations a “single source of truth” for their data. Information about data assets is usually spread across tools for data lineage, data quality, data preparation and cleanup, and more. Data silos still impede discovery and enrichment. And dark data remains dark, hidden, and unused (let alone catalogued).
DataOS has these principles at its heart. Its metadata engine, Metis, allows DataOS to apply rich metadata covering all aspects of a dataset, from lineage to documentation. Sitting atop your data ecosystem, DataOS accesses every dataset in your organization, without moving the data, to eliminate silos and dark data.
For more information, or to arrange a demonstration (in days, not weeks), email us at [email protected].
Be the first to know about the latest insights from Modern.
For today's Chief Data Officers (CDOs) and data teams, the struggle is real. We're drowning in data yet thirsting for actionable insights. Traditional data architectures, with their centralized data lakes and batch-oriented processing, are like bloated, slow-moving...
Ever wondered why building data-driven applications feels like an uphill battle? It's not just you – turning raw data into something meaningful can be a real challenge. The process of extracting, transforming, and loading data, not to mention the subsequent phases of...
The Modern Data Company has been given an honorable mention in Gartner's 2023 Magic Quadrant for Data Integration. In honor of this achievement, we'd like to re-introduce ourselves for 2024 and let everyone know why DataOS has been and still is one of the most...
In the intricate and competitive world of wine and spirits, leveraging data effectively has become a cornerstone for success. Yet, this task is often hindered by a range of challenges, such as the lack of in-house data expertise, the high costs associated with data...
Problem & Opportunity Statement There have been constant shifts in alcohol drinking trends across the global markets, and with each new year, a new set of alcohol beverage consumption statistics, trends, and predictions follow. According to Distilled Spirits...
Unleashing the Power of AI with Data Products Traditional project-centric data management stifles AI innovation with siloed data, slow workflows, and limited reusability. Enter the era of data products: self-contained modules of data, logic, and infrastructure that...
A Pan-Industry Revolution with DataOS® Unleash the revolution with Data Products powered by DataOS®. These self-contained data units, bursting with actionable insights, offer unmatched flexibility, agility, and compliance across all sectors. From personalized customer...
Cross-Sell Accelerator for Credit Cards In the hyper-competitive BFSI landscape, maximize credit card cross-sell potential with data-driven precision. Cross-Sell Accelerator empowers you to forge deeper customer connections with personalized offers, optimize...
Maximizing Snowflake Investments with DataOSUnleash the true potential of your Snowflake investment with DataOS®, the data product platform that seamlessly integrates, empowers, and elevates your existing infrastructure. Build robust data products faster, eliminate...
The Modern Data Company Overview The Modern Data Company Overview