Software Cost, Software Maintenance

2020-11-08

Recently I read a post about Rust vs Go.

I don't want to discuss the specific argument of Rust vs Go here as it's actually pretty boring, but one section - the section on correctness about 3/4 of the way through the article (especially the pullquote) - got me thinking on the topic of cost. This is the quote:

"My take: Go for the code that has to ship tomorrow, Rust for the code that has to keep running untouched for the next five years."

In my career so far, I've worked in three shops, all on different kinds of products and environments: one a loyalty program startup, one a "startup within a Fortune 500", and one a startup building an on-demand manufacturing platform.

Across those jobs, I've written and maintained code that ran in production in Ruby, Elixir, Javascript, Scala, Clojure, Java, Rust, bash, SQL, Python, and probably more that I'm forgetting. I've worked on webapps, command line apps, "big data" batch-job programs, real-time telemetry ingestion, 3D geometry processing, and some higher-level firmware for embedded devices.

At every shop, for all languages and domains, the majority of effort, money, and time was spent on maintenance.

Put another way: the most costly part of the software has been changing the existing software to better reflect new or different demands or environmental constraints, not getting the initial version working.

I am not talking here of what Rich Hickey might term "problems of misconception", where we didn't know what we were trying to do in the first place, but the more typical kind of maintenance work: that of adding an entirely new capability, adjusting an existing capability to meet a new constraint, fixing something that is either incorrect with regard to a specification, or fixing something that is incorrect with regard to some runtime characteristic. This kind of work dwarfed the cost of writing the stuff to begin with.

Most problems I've seen or heard of at jobs didn't require code that "has to ship tomorrow", even at startups with time and capital pressure. These did exist and I won't pretend otherwise, but they weren't typical, and their scope is usually narrow. Most code is either written new with the expectation to stick around for some period of time as an integrated piece of some other system, or code to adjust the behavior of an existing system. Most of the code that I've seen and the programs I've used in a professional capacity seems to obey a kind of Lindy effect where things that have been around for a while have a propensity to stick around.

Existing software and code I've seen also has tended to have its "low hanging fruit" picked early, as one might expect: things that are easy and cheap to change are changed, and bugs that are easy and cheap to fix are fixed, leaving only progressively more difficult and costly changes and fixes. One reason for this, in my experience, is that my confidence in being able to change some piece of the system while preserving the other existing characteristics of the system decreases as the system grows and ages. The typical word for attempting to change one thing and accidentally changing others is "regression".

Another reason for the increase in cost/difficulty of changes/fixes over time is due to the difficulty of deciphering where one piece or layer or feature of the system ends and another begins. That is, things are intertwined, or tangled. Different pieces of functionality tend to bleed into each other, and it becomes difficult to individuate characteristics of the system. People usually call this "messy", "unclear", "muddy", or "spaghetti" code.

For these reasons I don't buy in to the framing presented in the quote at the beginning of this post. I think it inaccurately portrays software as something which we must create as quickly as possible, or something we create and then leave untouched for years at a time with no upkeep. Obviously the quote author is exaggerating for effect, but this sentiment is not far off from real attitudes I've seen out in the wild.

For example, for many projects I've worked on, as well as many friends have worked on and then told me about, the name of the game was "get the thing built as fast as possible, cut corners if you have to, maybe we'll come back to it some day". This is basically strip mall software. Built as quickly as possible, using low-quality construction, poorly integrated into its community, with little to no planning for its future upkeep. Basically, it works, albeit barely, but no one really likes it, and we only put up with it because of its convenience and the apparent cost of a suitable alternative.

There is another, different class of projects that seem to mostly work fine, but the prospect of changing them or fixing them so they work better appears too difficult or costly. You might call them "working but Byzantine." Kind of like the wiring in some old houses. They are precariously balanced, and if something were to go wrong they might come crashing down. These projects are dangerous, as it is unsafe to depend on something that you can't readily maintain or replace.

Neither of these are sustainable ways to build or live, be for code or community.

All this isn't to say that Go isn't a great language for writing code that has to ship tomorrow, or that Rust isn't great for long-running systems. Maybe Go and Rust are fine for those purposes. Either way, I don't see either of the quoted rhetorical choices as a realistic conception of software as I've seen it practiced, or as I believe it ought to be practiced by professionals or viewed by the general public. Maintenance is deeply human. It's something we have always done and something we will always do.

We should evaluate tools for their ability to help us create and maintain software sustainably over time, so that our users can thrive. There exists enough "strip mall software" and enough working-but-Byzantine software in the world, and we should be suspicious of tools, mindsets, and people that try to sell us speed-of-construction cons or maintenance-free magic.