How to drive trusted decisions without changing your current data infrastructure.
Learn more about DataOS® in our white paper.
Data center downtime can be costly. Gartner estimates that downtime can cost $5,600 per minute, extrapolating to well over $300K per hour. When your organization’s digital service is interrupted, it can impact employee productivity, company reputation, and customer loyalty. It can also result in the loss of business, data, and revenue. With the heart of the holiday season happening, we have tips on how to enjoy holiday downtime while avoiding the high costs of data center downtime.
To prevent data center downtime, it’s important to first understand why it happens. There can be many causes; an analysis found that the main cause for unplanned downtime is software system failure (27%), followed by hardware system failure (23%), human error (18%), network transmission failure (17%), and environmental factors (8%). Human error is thought to make up 55% and 22% of critical application downtime by contributing to 40% of operation errors and system outages, respectively. Only around 7% of the system outages involved security-related incidents. Much of this downtime results from mistakes of inexperienced staff or, more rarely, intentional and malicious activities of employees. These happen when changes are implemented, such as upgrading software, patching, and reconfiguring systems. In the ever-evolving world of technology, stagnation is not an option — so the solution is to find guardrails for change that don’t impede innovation.
Since human error contributes to a significant portion of data center downtime, one of the safest ways to handle the holidays is to put a hold on changes. A best practice across most tech organizations is to have a system embargo during special dates. For example, right before a new product launch, during special events, and during the holidays. Usually, a code freeze will be put in place a few days or even a week before the special period to ensure that no mistakes are pushed into the system.
Load testing the system can also be helpful before the holidays. This is especially relevant if you have an e-commerce site or another service that may become more popular during this time. Using load testing, you can check the performance of your site, app, or software under different loads. This will help you to better understand how it performs when accessed by a large number of users, and at what point bugs, errors, and crashes become an issue. It will also help expose bottlenecks and security vulnerabilities that occur when the load is particularly high. Knowing the limitations of your system can help with setting up alerts and providing relevant information to the people solving any issues that do arise.
As is often said: failing to plan is planning to fail. Having a good on-call plan in place, including documentation and a system alert management system, will go a long way to limiting downtime that does occur. Many organizations have a rotating schedule over the holidays, where different engineers are on-call for 24-hour periods. Having a good system alert management system in place helps expedite the process, by alerting the on-call engineer of issues quickly and ideally proactively.
Data architectures are becoming increasingly complex, which makes them more rigid and fragile. Many rely on multiple discrete data sources, multiple layers, various interfaces, and a spaghetti of pipelines. In this modern-day scenario, building high-availability applications becomes increasingly difficult. Each of the sources, layers, pipelines, and applications built on top become an additional “point of failure” to be comprehended in the high availability architecture.
Human errors are one of the major adversaries of availability. In 2017, a typo at Amazon took down Amazon’s popular web hosting service, S3, and with it, a good portion of the internet. Human error extends beyond writing proficiency. Without proper data observability, logging, governance, and documentation, the number of potential human errors can be wide-ranging. For example, relying on low-quality data sources can cause hard-to-identify bugs at large scales.
Security threats can also result in downtime. During the holidays, technical teams that organizations rely on to secure services may be less available, making them vulnerable times for an attack. Overly complex data architectures, multiple disparate pipelines, dark data, and improper governance can all present serious security risks.
To achieve five nines (99.999%) availability, technical teams need modern tools to overcome the barriers described above. DataOS, an operating system for your data stack created by Modern, can help with all of these obstacles to availability, and more. It supports every data lifecycle stage while improving the quality, discoverability, and observability of your data.
As a layer on top of your legacy or modern databases, it enables a modern programmable enterprise with a composable, flexible data architecture. DataOS weaves a connective fabric between all of your data sources, dramatically reducing the number of fragile pipelines. This simplified data architecture means fewer potential points of failure. Built-in tools provide unprecedented observability, helping teams to quickly understand, diagnose, and manage data health. The flexible, robust architecture and heightened visibility and observability of data provided by DataOS translate to increased capacity to prevent downtime.
Especially during the holidays, teams are stretched thin. This compounds existing strains on IT teams that already spend most of their time on maintaining data, leaving less time to derive value from it. DataOS automates significant portions of the tedious, but essential, data-gathering and engineering tasks. This leaves more time for technical teams to dedicate to operationalizing data and preventing downtime.
While it may not be possible to prevent all service failures and interruptions, it may be possible to predict them. Predictive analytics can be an invaluable tool for preventing IT disasters. The ability to properly store and access large, big-data sets containing historical performance information and machine learning capabilities are necessary for attaining accurate predictions. That’s why DataOS contains all the essential tools for building high-performing machine learning. Out-of-the-box UI, CLI, and AI tools support every stage of the data development lifecycle, from finding, accessing, governing, and modeling data to measuring impact. With DataOS, your teams can build predictive analytics to proactively problem solve, without unexpected interruptions during special holidays.
Don’t let data center downtime interfere with your holiday downtime. Learn more about DataOS here.
Twitter handles around 500-700 million tweets per day, equating to roughly 12 terabytes of data every 24 hours. Another way to understand the scale is to consider that this amounts to over 4K terabytes of data each year. Even with an army of the savviest data scientists and engineers, handling this much data is a challenge. And adding to this challenge is the fact that much of the data is freeform text, albeit short text, which is stored as unwieldy unstructured data.
And then there are bots. This means that engineers are essentially tasked with creating machines that can pass the Turing test — and coming up with supplemental approaches. As Parag Agrawal, the CEO of Twitter wrote, “Spam isn’t just ‘binary’ (human/not human). The most advanced spam campaigns use combinations of coordinated humans + automation.” Agrawal added, “fighting spam is incredibly *dynamic*…You can’t build a set of rules to detect spam today and hope they will still work tomorrow. They will not.” Every day, Twitter suspends over half a million accounts and locks millions of accounts each week that are suspected of being spam. Meanwhile, this has to be balanced with avoiding suspending or adding excessive friction for real people — which can be troublesome given that many real accounts may superficially look fake.
Now add to this the complexity of creating reporting on spam accounts that is both up-to-date and accurate. Reporting like $44 billion depends on it — because it just might. Earlier this year, Elon Musk offered to buy Twitter for $44 billion. Afterwards, he tried to back out of the deal, claiming that Twitter was infested with a larger number of “spam bots” and fake accounts than they had disclosed. Twitter retorted that they use a “generic web tool” that classified even Musk’s Twitter account as a possible bot.
Twitter estimates <5% of reported monetizable daily active users (mDAU) per quarter are spam accounts. They come to this estimate using multiple reviews from humans replicated over thousands of randomly sampled accounts continually over time from accounts that are considered mDAU. Agrawal emphasized that combining both private data and public data is essential to accurately determine whether an account is spam or real. It goes beyond a Twitter handle that is NameAndABunchofNumber with no profile picture and unusual tweets. Important data points include IP address, phone number, geolocation, client/browser signatures, and what the account does when it’s active. Even these are part of a higher-level description. These estimates are done each quarter, and according to Agrawal, they have small error margins.
If you deal with data in your day-to-day, then you likely understand the scale of what was just described. There are many key topics embedded in this case:
To produce an accurate rebuttal of Musk’s accusations, Twitter needed to present accurate processes for dealing with the identification of spam bots. This involves acquiring, managing, and governing data that is both structured and unstructured and public and highly sensitive. This is not an easy task.
When it comes to solving these challenges, many organizations get stuck in data swamps. Often, this means that 90% of time, energy, and cost is spent on moving, selecting, sourcing, synthesizing, exploring, cleaning, normalizing, and tuning data. This leaves only 10% of time to extract and generate value from this data. Armies of data engineers juggle a spaghetti of ETL’s using Airflow, Python scripts, and Talend, which require constant monitoring and maintenance.
This 90/10 split of time, energy, and cost often results from systemic non-unified data infrastructures. Organizations’ data landscapes have become increasingly fragmented, and as a result, overly rigid. The rise of point solutions adds to this complexity and makes centralized and native governance, as well as observability, nearly impossible. Engineers, who would otherwise be building the intelligent systems like Twitter’s spam identification, must spend most of their time performing the duties of system integrators. This is because the traditional approach many organizations take focus on data processing rather than activation.
The case of Twitter demonstrates the powerful combination of business-driven data initiatives executed using an infrastructure that can fluidly combine necessary, trustworthy data points. Organizations like this that are truly data-driven have insight into available data, the ability to compose data, and the flexibility to work right-to-left. This allows them to meet complex business problems like spam bots with reliable data solutions.
Not every organization is Twitter: we don’t all have an army of data engineers and scientists who can seemingly magic data into dreamy, exquisite AI/ML models and high-confidence quarterly reporting. That’s why our team at Modern built DataOS.
DataOS is a modern layer over your legacy or modern data infrastructure. This allows you to instantly use your data in modern ways without needing to disrupt your business. As soon as DataOS is implemented, you’ll get deep insights into all of your available data—no matter how dark or siloed. From there, most data analysis can be done while the data stays in place. That means you can move only the data that needs to be operationalized. Since DataOS was built with principles of outcome-based engineering, it can automatically retrieve data based on the need defined by business users, pipeline-free. The fully composable architecture of DataOS enables your organization to realize data fabric, lakehouse, CDP in weeks instead of years.
DataOS can superpower data initiatives across your organization, to help you realize a data-driven future and solve essential business problems—simply. The case of Twitter demonstrates the importance for your organization to operationalize disparate, siloed data sources to create accurate business metrics. This can be worth more than $44 billion: it means staying competitive in an ever-changing landscape.
Want to learn more about DataOS, and how it can help Twitter-ify your data? Download our white paper, DataOS: A Paradigm Shift in Data Management.
According to a recent survey of manufacturing industry leaders, 91% increased investments in digital transformation with 77% of those investments being cited as significant or dramatic. Executives recognize that technological advancement is necessary to remain competitive in today’s ever-changing and frequently-disrupted landscape.
Manufacturers that accomplish digital transformation see vast improvements in manufacturing quality, transparency, and visibility into the production process, and faster development cycles. Data is the lifeblood of digital transformation. The manufacturing leaders of tomorrow will be those who find holistic approaches to data transformation that achieve the highest data ROI.
The global pandemic brought on many challenges for manufacturers. The same industry survey indicates that as a result, 68% of manufacturing industry leaders see improving supply chain resilience and agility as their number one business priority. Within this goal, 54% aim to increase the speed of product innovation. Another 40% said that reducing their carbon footprint by investing in sustainable manufacturing processes is a priority. A separate academic study found that the main organizational drivers for digital transformation in manufacturing are:
Despite the major business, human, and environmental benefits unlocked through digital transformation, many manufacturers still have concerns. In fact, 94% of manufacturing industry leaders reported having concerns about their supply chain that hampers innovation. The following concerns were the most cited by executives:
Additionally, limited talent supply, rigid internal structures, and significant employee time investment have slowed or stalled digital transformation of manufacturer supply chains. In order for manufacturing organizations to innovate and remain competitive, these legitimate concerns and obstacles must be addressed. A solution is needed that can unlock and enable the benefits of digital transformation, while simultaneously answering the needs and concerns of the business. The good news: this solution already exists.
A data operating system, like Modern’s DataOS, is an out of the box data ecosystem — and the key to unlocking digital transformation. Features of a data operating system address both the drivers and challenges for digital transformation, such as:
IBM states that 53% of manufacturing organizations leverage big data and analytics to generate competitive advantage. These organizations understand that an increasingly global and competitive landscape necessitates digital innovation and transformation. Big data is a key piece in staying at the industry forefront to better produce, maintain, and ship high-quality goods.
Leading manufacturers are already reaping the benefits of big data. Using big data analysis, the Coca-Cola Company was able to save ~$45M annually. Over two years, John Deere (Deere & Company) saved $900M in inventory control thanks to big data. These organizations demonstrate the huge potential of the data being produced at each level of the manufacturing process.
If you’re in the manufacturing industry, you may be wondering how you can unlock the power of big data within your organization.
The term “big data” is often spoken about, but its meaning is rarely made clear. To truly unlock the massive value of big data, it’s important to first understand what it is and what it isn’t.
Big data is a comprehensive term for all the processes and techniques that enable storing, organizing, and analyzing massive data sets. This can include billions of rows and parameter values. Sources for big data can include machines, devices, operators on the factory floor, and every level of your organization. For manufacturing, big data is made up of business-critical information on sensors, pumps, motors, compressors, and conveyors.
As you can see, it’s neither a single piece of software nor is it purely hardware-related. Simply moving your existing data infrastructure to the cloud also won’t immediately unlock the value of big data. Getting the fullest return on investment for the massive data sets being produced requires a data ecosystem, where the software your organization uses supports the infrastructure you already have in place.
One of the biggest benefits that big data can unlock is the ability to incorporate machine learning into your organization. Big data is necessary to feed algorithms that can predict, triage, and proactively solve problems within your organization.
Traditionally, manufacturing and other organizations use linear regression and absolute numbers to identify issues and opportunities. This can be in the form of human-monitored data visualizations, where a large drop in a machine’s output points to an issue in production. With machine learning, this process becomes automated and streamlined.
Machine learning on top of big data allows the identification of large-scale covariance and correlation, making root causality proactive and automatic. The benefits of big data mixed with machine learning extend to many other areas of the manufacturing process and are especially useful for these data sets, which are often noisy and especially large. This combination can be used to proactively identify machine failure, predict machine degradation, and optimize machine efficiency.
Big data helps manufacturers by:
Big data and its benefits rely on key principles to be in place. Without these, your organization won’t be able to reap any of the benefits of big data. These include:
Modern has built DataOS to include all of these elements, out of the box. It is an entire data ecosystem layered on top of your legacy or modern data infrastructure.
This past week, we had the chance to present a new, modern approach to the data development lifecycle at the North America AI & Big Data Expo in Santa Clara, California. We were joined by 5,000 like-minded attendees, including developers, designers, data experts, innovators, and many more. All came together with the common goal of building a smarter future with AI/ML and big data.
On the first day of the conference our very own head of AI, Ash Damle, gave a keynote talk on a novel vision for empowering a new generation of data developers: a decision ops platform. He explained the current challenges and what’s needed — an entire AI/ML data lifecycle covering data discovery, cleaning, cataloging, observation, and governance. This not only accelerates the everyday for data scientists and engineers, it also unlocks data’s full potential for an organization.
How? We share an overview below.
There are seemingly innumerable examples of how AI is changing the world. It helps make the impossible possible, all the way to outer space with SpaceX. It facilitates scaling and reengineering our everyday lives, including Cruise’s autonomous vehicles. It empowers us to see the world in new, clearer ways, demonstrated by Google’s novel NeRF technology. AI superpowers productivity and creativity, with OpenAI DALL-E as a great example. Lastly, it revolutionizes our capacity, illustrated by innovations such as production AI from Fraunhofer-Institut für Optronik.
What is the difference between this world-changing AI and ineffective AI? It’s data.
The beating heart of all good AI/ML is data. Data that is clean, connected, tall, and wide is what transforms “just a model” into amazing technology. It may sound simple, but paradoxically data is often the most complicated part of building AI/ML. For most organizations, 90% of time, energy, and cost is spent on getting data ready to plug into modern AI platforms, while a mere 10% is spent generating and extracting value.
The core of this issue is commonly the lack of comprehensive data abstraction to make sense of a set of disparate and heterogeneous data sources. This equates to an army of data engineers with a spaghetti of ETLs using Airflow, Python scripts, Talend, and other technology; the rigidity of the system creates fragility that requires constant attention.
What is the root of this issue, and what are the barriers to fixing it?
Operational, strategic, and tactical barriers currently stand in the way of organizations being truly data driven. A lack of native governance and observability, labor intensive data management, a shortage of high-skilled resources, shrinking budgets, and increasing requests translate into a majority of time being spent on operational tasks. Combined with costly data TCO, high failure rates, extensive business disruption, and remaining competitive in a field with increasingly advanced analytics leaves organizations with little time for proactive strategic initiatives.
Perhaps most importantly, needlessly complex data infrastructure, lack of data agility for experimentation, and the rise of a labyrinth of “point solutions” leading to overly rigid data architectures restricts the amount of value that can be extracted from data; therefore, limiting the power of AI/ML across the organization.
Ultimately, these obstacles amount to a fragmented data landscape, where IT is forced to act primarily as system integrators and modern data needs aren’t met. It doesn’t have to be this way.
Data developers are the ones designing and building this data-driven future: they are the experts molding data into decision products. A decision product is a data asset used to affect decisions by humans or machines. Data composability, visibility, and quality are the differences between a flawed model and a fruitful decision product.
However, given the barriers described above, data developers don’t currently have tools they need. In order to unlock these tools, we must apply concepts from engineering to data and data development. These processes will empower data developers to see, use, and connect data differently with less time, effort, and maintenance.
A platform like Modern’s DataOS provides this shift as a modern data layer that automates a significant portion of these tasks while reducing time, risk, and cost — accelerating data value extraction for data developers and the organization at large.
DataOS, the world’s first data operating system, makes all facets of data use simple. It is built on the principles of data clarity, true data democratization, and ease of use. Four conceptual pillars ensure an infrastructure that reduces time-to-decision-product: flexibility, composability, outcome-based data engineering, and data as software.
Modern’s tools for data developers embody a powerful shift for the entire AI/ML data lifecycle. DataOS provides a unified way of accessing, joining, governing, and observing all of an enterprise’s data. It delivers loosely coupled, tightly integrated building blocks that enable organizations to compose flexible, comprehensive data architectures.
DataOS provides an integrated data layer that automates a significant portion of data processing tasks while reducing time, risk, and cost. It provides data developers with the tools they need to treat data as software and spend less time processing data, with more time operationalizing data. It’s like Git for data, a thoughtfully built platform with the UI, CLI, and other tools data developers need to manage the entire data development lifecycle. Meanwhile, built-in data governance and observability makes it easy to discover, share, and secure assets.
All of this is possible while significantly reducing time-to-value to a matter of 6-8 weeks. We’ve enabled this across double digits of customers with 200+ data sources, tens of thousands of tables, an average of 2.3K pipelines, and with data ranging from terabytes to petabytes. Today is the day to give your data developers the tools they need to build tomorrow’s AI/ML.
Then, take the next steps to schedule a demo to see it work in real time.