Lessons in Technical Debt from Southwest Airlines

E. Younanzadeh

•

February 14, 2023

•

It was hard to miss Southwest Airlines’ holiday travel fiasco earlier this year. After a winter storm blew through a large swath of the United States, Southwest’s systems and processes had a complete meltdown. It took thousands of canceled flights, many days, and countless disgruntled employees and customers before things got back to normal. While the weather certainly was a catalyst for the mess, it is widely understood that a high level of technical debt within Southwest’s operational systems made a bad situation much, much worse. This blog post will explore some of the factors that led to the meltdown and offer some ways organizations can avoid similar trouble with their own outdated systems.

Non-Technology Contributors to the Meltdown

Outside of the storm itself, there were multiple contributors to the problems that Southwest had across its’ operations. One factor is that Southwest doesn’t follow the traditional hub and spoke model followed by most airlines. Instead of having planes go out and back from a hub, its planes each go on their own circular route. This is helpful for avoiding major issues when any given city has a disruption. For example, a winter storm in other airlines’ hubs will cause ripple effects across the country since flights everywhere can depend on passing through the trouble spot. In Southwest’s case, any given city won’t have a big impact on operations. However, a weak spot was exposed when a massive disruption hit many cities at once during a peak travel time. As more airports were impacted, Southwest’s more complex flight structure was stressed until it broke.

Along with its unique flight structure, Southwest traditionally has its planes fly more flights per day with higher loads per flight. This means that once things get messy, it is harder to recover. This is especially true in the holiday season when there were few free seats to be found to accommodate the passengers impacted by cancellations. When things are running smoothly, having full planes and little spare capacity is terrific. The same things become a negative when trouble hits.

One last factor that is tied to the prior two is that Southwest ends up with planes, pilots, and crew scattered everywhere to support its unique style of operation. Whereas other airlines can focus on getting everyone back to a hub and resetting the system from there, Southwest can’t do that. When crews time out at a hub, there will likely be other crews available to step in. In Southwest’s case, surplus crews just don’t exist in many markets, and it isn’t a simple matter of flying some more in from a hub. Rather, there is a complex game of musical chairs that must take place as crews are redirected from one place to the other while ensuring things will be covered once they leave their current spot.

Technical Debt as a Major Driver of the Disruption

While those non-technical factors certainly did add to the holiday mess and certainly aren’t irrelevant, a major factor was Southwest’s widely-known and acknowledged technical debt within its outdated systems. Southwest’s unions have been so concerned about the outdated systems that they even prioritized asking for systems to be updated above asking for more pay. You know it’s serious when employees put something above their paychecks!

Due to the outdated systems, crew members often have to call in to let the airline know where they are and to ask for instructions on what to do next. During the holiday mess, crew members were often waiting for hours for their call to get through, which only delayed things further. In 2023, this seems crazy since certainly there must be a system that knows what flight the crew was assigned, knows the flight was late, and can update the crew scheduling and support systems, right? Apparently, this isn’t the case as the systems don’t talk well enough to handle such seemingly basic tasks.

While Southwest’s technical debt within its outdated crew and aircraft scheduling systems has been discussed for years both internally and in the press, it will certainly become a major focus that will be addressed aggressively moving forward. The airline has estimated the costs of the debacle to be in the many hundreds of millions of dollars thus far. The inevitable lawsuits and other yet unseen costs will only take it higher. Suddenly, the painfully expensive system upgrades required look to be the less painful option than continued repeats of what happened in December.

Is There a Way to Make Progress While Systems Are Replaced?

Most people would suspect that Southwest’s only option is to make do with what they’ve got while they work as fast as they can to upgrade their legacy systems. While it is true that those legacy systems must be replaced, it is not true that there's nothing else that the company can do in the meantime. A new concept called a data operating system can help improve what’s in place today while also helping to integrate updated systems once they are online.

A data operating system sits on top of any current system, even legacy ones. It inventories the data assets within each system and creates a central mapping of all corporate data across all of the systems. This requires no changes to the underlying platforms outside of allowing the data operating system to have access. Once the corporate data has been mapped, the data operating system creates a single, central entry point that allows users to query and explore data across the enterprise. It also adds a cross-system security and governance layer that ensures that corporate policies are followed.

While a data operating system might sound too good to be true at first, it is really a natural extension of the evolution of APIs, services, and system interconnectivity. By taking advantage of the fact that even legacy systems allow query access, a data operating system can enable modern functionality on top of even legacy systems. By accessing each platform’s data and allowing the data to be mixed and matched with that of other platforms, a data operating system updates an organization’s entire infrastructure with a modern veneer that allows users the access to the data and analytics that they require. Over time, as the underlying legacy systems are replaced, the data operating system can simply repoint from the old system to the new one, and end-user functionality will continue uninterrupted.

Explore What a Data Operating System Can Do for You

If the idea of a data operating system sounds appealing, then start by learning about DataOS from The Modern Data Company. The first and most robust data operating system, DataOS is helping companies modernize their data and analytics functionality and access even when there is a substantial legacy system presence. While the damage is already done at Southwest, they could make progress immediately by leveraging DataOS alongside their legacy system modernization initiatives. You and your organization can make use of DataOS today to — hopefully — avoid a Southwest-style meltdown of your own.

To learn more about how a data operating system like DataOS can help your organization modernize its systems and end user functionality, download our e-book Maximize Your Data Transformation Investments.

Topics:

Data Engineering

Data Governance

Digital Transformation