
How to drive trusted decisions without changing your current data infrastructure.
Learn more about DataOS® in our white paper.
Companies capture data like champions, but there’s always a sneaking suspicion that they aren’t using the data to its fullest potential. Unfortunately, this isn’t just the companies’ imagination (or paranoia). A startling amount of data goes totally unused, not simply used in a less than optimal fashion.
So what’s happening? Inconsistent formats, out-of-date datasets, and silos all over the place are preventing companies from becoming data-driven. A lot of unused corporate data does have value, but it isn’t easily accessible for business users and so they don’t make use of it. Of course, plenty of the data that is actually used is not used to its full potential, either. All of this puts in place barriers to the adoption of a data-driven culture. One way to make significant headway is to engage in regular data health checks.
The first place to start with data health is the quality of data. In this blog, we’ll focus on five specific quality checks that can be implemented in order to drive adoption and usage of data higher while enabling more value to be generated from the data.
The first and simplest check is to validate that every data source has the full amount of data it is expected to contain. For example, if a transactional data source is expected to have all transactions from 1/1/2015 through today, is the data all actually present? It isn’t at all uncommon for certain periods of data to be missing, whether because someone forgot to update it or because an update process failed. Nothing frustrates users more than data they expect to see not being available. Missing data not only corrupts results, but it lowers user trust.
New data sources become available all the time. Just because a platform contained all relevant data a few months ago doesn’t mean that it still does today. There might be new customer survey data, or new sensor data, or data from a new mobile app. Whatever the case, it is important to identify new sources and if they are valuable enough to make available to users for analysis purposes. If so, then the data should be loaded and an process put in place to regularly update it moving forward.
Some data is generated and used infrequently, while some data is generated and used in near real-time. Any data source must be updated at a pace that matches the business requirements. Some data, like demographic data, changes rarely and requires only infrequent updates. Other data needs to be updated much faster. For example, perhaps a transaction file available for analysis is found to be current as of the previous day. That sounds great unless the file is supposed to be updated every 5 seconds in order to facilitate website customization. There is no absolute definition of timeliness as it relates to data. Rather, every data source must be “timely enough” for its intended uses and purposes.
Data quality is one of the biggest issues corporations struggle with and it is one problem that is never fully solved. Even if all historical data is certified today as being 100% clean and accurate, that can change rapidly as users or applications update pre-existing data or add in new data that contains errors. Accuracy starts with data ingestion procedures being constantly updated to fix problems that have been identified. It is also a best practice to continuously monitor the distribution of any data element to identify when something suddenly looks different. All the right values of transaction type might be present, for example, but in all the wrong proportions. Automating basic data accuracy checks is necessary in a data-driven environment and it is becoming a common approach that is being implemented broadly.
Data not only needs to be available, but it needs to be available in such a way that users and applications can query and analyze it fast enough for their needs. Simply dropping raw data into a platform and turning people loose on it is not a winning formula. Even when a data platform has been tuned and optimized, settings will need to be updated. Usage patterns of the data within the platform can change. New data sources can become very popular. A large number of new users or processes might come online. Any of those can make what was a high-performing platform start to struggle. It is necessary to constantly monitor and track performance and be ready to make adjustments over time as change occurs in how the data is accessed and used.
Of Course, data quality isn’t the whole story. Companies must also consider other areas such security and governance for a complete picture of data health. Those may be addressed in a future blog. Making the health checks discussed here a regular practice provides a solid foundation for finally using all — yes, all — available data effectively and efficiently within a data-driven environment.
One approach to a data fabric that can enable everything discussed here has been productized as the DataOS offering from The Modern Data Company. To see how DataOS can transform your use of data to drive value, contact us to schedule a consultation.
Be the first to know about the latest insights from Modern.
In our previous post, The Pros and Cons of Leading Data Management and Storage Solutions, we untangled the differences among data lakes, data warehouses, data lakehouses, data hubs, and data operating systems. Remember to read part one if you need a quick refresher. ...
Data lakes, data warehouses, data hubs, data lakehouses, and data operating systems are data management and storage solutions designed to meet different needs in data analytics, integration, and processing. Each has unique advantages and drawbacks, and the right...
What is a data operating system? On the surface, it's an operating system designed specifically for managing and processing large amounts of data. It typically provides a scalable and flexible infrastructure for storing, processing, and analyzing big data and should...
Prevention and early intervention are essential to building an effective healthcare approach that supports patients from start to finish. The critical component of this approach is predictive analytics — analyzing big data gathered from patients, consumers, and...
Technical debt is something that many companies are aware of and are attempting to address. It is a big enough issue that several of our recent blog posts (Lessons in Technical Debt from Southwest Airlines, Start Paying Down Your Technical Debt Today, and A Better Way...
Data Mesh + Patient360: A Modern Revolution for Healthcare DataHealthcare organizations are sitting on a treasure trove of customer data. Operationalizing that data makes it actionable and usable, helping improve services, costs, and patient outcomes. However,...
The Modern Data Company BriefThe Modern Data Company is radically simplifying data architecture with its paradigm-shifting data operating system, DataOS. We're replacing overwhelm with composability, reinventing governance, and connecting legacy systems to your newest...
DataOS® – The Fastest Path from Data to DecisionDataOS is the world's first fully-integrated data operating system designed to move from companies from data to decision in weeks instead of months. Discover what makes DataOS different from the competition and how...
Not Getting Value from Your Data Transformation? Fix itImplementing customer lifetime value as a mission-critical KPI has many challenges. Companies need consistent, high-quality data and a straightforward way to measure CLV. In the past, organizations have struggled...
DataOS® Solution:AI/ML 70% of AI initiatives fail and teams spend the vast majority of their time simply prepping data for platforms, leaving very little left over for gaining insights and driving business value. But an AI/ML platform powered by DataOS can achieve...