3 minutes to read - PDF version
Timeliness is a metric of data quality, measured by when data is available compared to when it is needed for analysis. Harvard Business Review estimates that poor-quality, out of date data costs US businesses $3.1 trillion per year. Yes, you read that correctly...$3.1 trillion. A survey by LeadJen found that sales teams waste over 27% of their time due to out-of-date data., including wasted calls and the cost of tracking down accurate data.
Keeping data timely is about getting it to the people who need it as quickly as possible. However, in many companies, the data pipeline is slow and cumbersome with long delays between when users asked for data and when they finally receive it. This article will look at data timeliness through the lens of different use cases and show how DataOS® can simplify this complex problem.
This standard drives different qualifiers for data timeliness, depending on the use case. Let’s look at three different use cases and see how data can have an extremely short window of use, a very long window, or a mixture.
Location data for a cell phone carried by a driver is extremely volatile. If it can be collected and used within seconds, then the driver can be notified about anything from interesting local restaurants to available nearby parking. Once the driver moves on, however, that data is no longer fit for purpose in that use case. For this use, “freshness” is the overriding driver of data utility.
At the other end of the scale, we see sales analytics. Here, currency and completeness are most important. Data must be complete and correct from the beginning of the reporting period (weekly, monthly, or annually) to the end of the reporting period.
In the middle is the analytic process that drives personalized shopping recommendations. For this use case, you want both the most up-to-the-second behavior data in the customer’s clickstream and all the past data that makes up the customer’s profile. All of this information is processed to give the customer a relevant and attractive personalized experience on the platform.
There are several reasons why data lags behind user's needs, including scattered and isolated datastores (information silos), inconsistent data governance, and data pipelines that can’t support users in a timely way.
To date, data governance tools have also been driven by the fit-for-use paradigm. The problem is that different business units within a company have such different needs and drivers that no single tool stack is ideal for all business users. The data sources and tool stack that fit well into IT (for example) may not suit the purposes of the marketing unit at all.
This lack of integration is one of the chief causes of information silos, as data is isolated within business units due to incompatible formats, connections, data architecture, and so on. Information silos also make it difficult to ensure that data remains timely, because data is extracted from multiple data lakes and warehouse, then loaded into scattered datastores, each of which has to be managed and kept up to date separately.
Another serious problem for data timeliness is the pipeline that brings data from where it is collected and stored to the users who need to make decisions based on it. More often than not, that pipeline flows through a company’s IT department, and that means that the time needed to assess an information request, pull the necessary data, and make sure that the information is not more than the user is cleared to access, may take weeks. If there is any problem in the process, or if the end user discovers a need for additional information, the process starts all over again, with additional delays.
So, how do we escape?
Through a standard subject-predicate-object syntax, supplemented with user-defined tags, DataOS produces a true data fabric encompassing all data sources in a company. Because of its granularity and flexibility, the platform can be configured to the needs of users in all business units, with data products created as needed and conforming to any format and specific requirements needed by users.
This means that there is no longer any need for separate datastores for each business unit or for cumbersome pipeline delays. Users can access central data repositories directly, with policies ensuring that data is filtered and masked automatically according to each user’s access privileges.
In the DataOS model, data policies are part of the code that defines the data fabric. A data product can be configured to pull in only data that conforms to any standards or models that users define. Privileges defined by tags give each user the appropriate level of access, and datastores can be evaluated on any desired schedule to ensure that outdated information is removed. All of this is transparent to the user, but immediately configurable by authorized managers.
Data Timeliness is an essential pillar of data quality. In most organizations, it is hampered by data silos, a lack of universal governance, and the bottlenecks. DataOS creates a unified data fabric linking every data source in a company, with direct user access to data repositories controlled by privileges defined for each user. By eliminating silos, unifying governance, and simplifying the pipeline, DataOS ensures that your data will be up-to-date and available when and as you need it.