Mike Schaeffer's Blog

November 22, 2024

Lately, I've been reading "Beyond Mach 3"), Col. Buddy Brown's memoir of his years piloting the U-2 and SR-71 spy planes. If you like that sort of thing, it's an interesting read on its own, but it also has parallels to the software industry, particularly in the way it describes pilot workload. The concept of pilot workload provides a nice way to think explicitly about the burdens our systems place on us as engineers and operators to keep them running properly.

If you're not familar with the U-2, it's a 1950's era aircraft designed to overfly hostile territory and take high resolution pictures of the ground for later analysis. It was a U-2 that flew over Cuba to take the pictures of Soviet missiles at the beginning of the Cuban Missile Crisis. To make these sorts of missions possible, the plane was designed to fly at very high altitudes, upwards of 70,000 feet. This is almost double the altitude of a modern passenger airliner.

What may not be obvious is that airplanes get progressively more difficult to fly at higher altitudes. If speed drops below the stall speed, the wings stall and lose lift. If speed gets too high, the airflow separates and the wings lose lift. Either way, if the speed deviates out of this window, the pilot loses control entirely. Due to the decreasing density of the atmosphere at higher altitudes, this window gets narrower as the plane flies higher. Fly high enough, the window closes entirely, and you've just discovered the ceiling of your aircraft. It can't fly that high at all.

A U-2 flying at 70,000 feet is flying very close to its ceiling. The range of permissable speeds at this point in its flight is just 5-10mph wide. Regardless of what else is going on, the pilot has to be completely precise managing their speed in order to retain control. This region of a flight envelope is referred to as the coffin corner), due to the catastrophic implications of a mistake.

On top of the work required to keep the plane in the air, the pilot of a U-2 is also taking celesital navigation readings every 30 minutes, compensating for the curvature of the earth, and running the sensor packages that are the point of the mission in the first place. In the best case, a mistake means the mission doesn't produce the desired results. In the worst case, a mistake means a crash in hostile territory and an international incident. To put it plainly, the U-2 asked a lot of its pilot as part of its basic operation. In aerospace engineering, this is referred to as having a high pilot workload.

For the U-2 program, this was arguably an appropriate design choice. The program had the ability to be highly selective in the pilots it chose, it had the ability to engage in extensive and costly training of those pilots, and it was operating at the limits of available technology. This was the only way their mission requirements could be satisfied at the time, so they were forced into a design that placed high workload demands on the pilot. As soon as technology got better, it became possible to fullfill many of the same requirements with less likelihood of error and overall lower risk, which is what happened.

There are a few lessons in this for software professionals. While most commercial software systems don't overfly hostile enemy territory, they hopefully do have tangible real world implications anyway. Otherwise, what's the point of building software at all? The reality is that if your software matters, operational failures will have tangible real-world impacts and costs. This means that to the extent economically possible, we have to get it right. This is where the workload concept can play a role.

In the context of aerospace engineering, workload refers to the burdens on the pilot and crew to keep the aircraft operating as desired. The same sort of concept also applies to software. How much time and effort is required to keep a software system operating properly? Costs here can range from everything to periodic backups, monitoring, difficulty of installation and upgrades, environment management, complexity of testing... the list goes on and on. Viewed from that perspetive, you should ask yourself explicitly if you and your team are you building a system with a low or a high workload. What does it take to keep your system running, and is it appropriate for what it does?

At some level, this is old news. Build, deployment, and test automation have been on the software professional's agenda for literally decades at this point. 17 years ago, Jeff Atwood wrote about Rico Mariani's highly related concept of the pit of success. That said, how close are you really to where you want to be with respect to the systems you help build and maintain? Even if the concept of workload is old, the need to revisit it and make sure you're handling it appropriately is evergreen.