Mike Schaeffer's Blog

June 5, 2023

If you've been around programming for a while you've no doubt come across the Lisp family of languages. One of the oldest languages still in use, Lisp has contributed much to the profession, but it's probably most infmamous for the "S-expression". S-expressions are a text based serialization of the languages core-data structures. Since Lisp is written in terms of those same data structures, the S-expression is also the syntax of the langauge.

To give you a taste if you're not familar, here's a simple Clojure function for parsing an string identifier. The identifier is passed in as either a numeric string (123) or a hash ID (tlXyzzy), and the function parses either form into a number.

(defn decode-list-id [ list-id ]
  (or (try-parse-integer list-id)
      (hashid/decode :tl list-id)))

In a "C-Like Langauge", the same logic looks more or less like this:

function decodeListId(listId) {
    return tryParseInteger(listId) || hashid::decode("tl", listId);

Right off the bat, you'll notice a heavy reliance on parenthesis to delimit logical blocks. With the exception of the argument list ([ list-id ]), every logical grouping in the code is delimited by parenthesis. You'll also notice the variable name (list-id) contains a hyphen - not allowed in C-like languages. I could point out more, but even stopping there, it's clear that Lisp syntax is unusual to modern eyes.

What may be even more unusual about this syntax is the fact that some people like it. I count myself among them. It's strange, but there are reasons for the strangeness. The strangeness, while it imposes costs, also offers benefits. It's these benefits that I wish to discuss.

Before I continue, I'd like to first credit Fernando Borretti's recent post on Lisp syntax. It's always good to see a defense of Lisp syntax, and I think his article nicely illustrates the way that the syntax of the langauage supports one of Lisp's other hallmark features: macros. If you haven't already read it, you should click that link and read it now. That said, there's more to the story, which is why I'm writing something myself.

If you've studied compilers, it's probably struck you how much of the first part of the source is spent on various aspects of language parsing. You'll study lexical analysis, which lets you divide streams of characters into tokens. Once you understand the basics of lexical analysis, you'll them study how to fold linear sequences of tokens into trees according to a grammar. Then, a few more tree transformations, and finally linearization back to a sequence of instructions for some more primitive machine. Lisp's syntax concerns the first two steps of this - lexical and syntactic analysis.

Lexical analysis for Lisp is very similar to lexical analysis for other languages. The main differences are the rules are a bit different. Lisp allows hyphens in symbols (see above), and other languages do not. This changes how the language looks, but isn't a huge structural advantage to Lisp's syntax:

(defn decodeListIid [ listId ]
  (or (tryParseInteger listId)
      (hashid/decode :tl listId)))

Where things get interesting for Lisp is in the syntactic analysis stage - the folding of linear lists of tokens into trees. One of the first parsing techniques you might learn while studying compilers is known as predictive recursive descent, specifically for LL(1) grammars. Without going into details, these are simple parsers to write by hand. The grammar of an LL(1) language can be mapped directly to collections of functions. Then, if there's a choice to be made during parsing, it can always be resolved by looking a single token ahead to predict the next rule you need to follow. These parsers have many limitations in what they can parse (no infix expressions), but they can parse quite a bit, and they're easy to write.

Do you see where this is going? Lisp falls into the category of languages that can easily be parsed using a recursive descent parser. Another way to put it is that it doesn't take a lot of sophistication to impart structure on a sequence of characters representing a Lisp program. While It is may be hard to write a C++ parser, it's comparatively easy to write one for Lisp. Thanks to the simple nature of a Lisp's grammar, the language really wears its syntax tree on its sleeve. This is and has been one of the key advantages Lisp derives from its syntax.

The first advantage is that simple parsing makes for simple tooling. If it's easier to write a parser for a language, it's easier to write external tools for that langauge that understand it in terms of its syntax. Emacs' paredit-mode is a good example of this. paredit-mode offers commands for interacting with Lisp code on the level of its syntactic structure. It lets you cut text based on subexpressions, swap subexpressions around, and similar sorts of operations based on the structure of the language. It is easier to write tools that operate on a langauge like this if the syntax is easily parsed. To see what I mean, imagine a form of paredit-mode for C++ and think how hard it would be to cut a subexpression there. What sorts of parsing capabilities would that command require, and how would it handle the case where code in the editor is only partially correct/

This is also true for human users of this sort of tooling. Lisp's simple grammar enables it to wear its structure on its sleeve for automatic tools, but also for human users of those tools. The properties of Lisp that make it easy for tools to identify a specific subexpression also make it easier for human readers of a block of code to identify that same subexpression. To put it in terms of paredit-mode, it's easier for human readers to understand what the commands of that mode will do, since the syntactic structure of the language is so much more evident.

A side benefit to a simple grammar is that simpler grammars are more easily extended. Fernando Boretti speaks to the power of Lisp macros in his article, but Common Lisp also offers reader macros. A reader macro is bound to a character or sequence of characters, and receives control when the standard Lisp reader encounters that sqeuence. The standard Lisp reader will pass in the input stream and allow the reader macro function to do what it wants, returning a Lisp value reflecting the content of what it read. This can be used to do things like add support for XML literals or infix expressions.

If the implications are not totally clear, Lisp's syntactic design is arguably easier for tools, and it allows easier extension to completely different syntaxes. The only constraint is that the reader macro has to accepts its input as a Lisp input stream, process somehow with Lisp code, and then return the value it "read" as a single Lisp value. It's very capable, and fits naturally into the simple lexical and syntactic structure of a Lisp. Infix languages have tried to be this extensible, but have largely failed, due to the complexity of the task.

Of course, the power of Lisp reader macros is also their weakness. By operating at the level of character streams (rather than Lisp data values) they make it impossible for external tools to fully parse Common Lisp source text. As soon as a Lisp reader macro becomes involved, there exists the possiblity of character sequences in the source text that are entirely outside the realm of a standard s-expression. This is like JSX embedded in JavaScript or SQL embedded in C - blocks of text that are totally foreign to the dominant language of the source file. While it's possible to add special cases for specific sorts of reader macros, it's not possible to do this in general. The first reader macro you write will break your external tools' ability to reason about the code that use it.

This problem provides a great example of where Clojure deviates from the Common Lisp tradition. Rather than providing full reader macros, Clojure offers tagged literals. Unlike a reader macro, a tagged literal never gets control over the reader's input stream. Rather, it gets an opportunity at read-time to process a value that's already been read by the standard reader. What this means is that a tagged literal process data very early in the compilation process, but it does not have the freedom to deviate from the standard syntax of a Clojure S-expression. This implies both flexibility to customize the reader and the ability for external tools to fully understand ahead of time the syntax of a Clojure source file, regardless of whether or not it uses tagged literals. Whether or not this is a good trade off might be a matter of debate, but it's in the context of a form of customization that most languages don't offer at all.

To be clear, there's more to the story. As Fernando Boretti mentions in his article, Lisp's uniform syntax extends across the language. A macro invocation looks the same as a special form, a function call, or a defstruct. Disambiguting between the various semantics of a Lisp form requires you to understand the context of the form and how symbols within that form are bound to meanings within this context. Put more simply, a function call and a macro invocation can look the same, even though they may have totally different meanings. This is a problem, and it's a problem that directly arises from the simplicity of Lisp syntax I extoll above. I don't have a solution to this problem other than to observe that if you're going to adopt Lisp syntax and face the problems of that syntax, you'd do well to fully understand and use the benefits of that syntax as compensation. Everything in engineering, as in life, is a tradeoff.

It's that last observation that's my main point. We live in a world where the mathematical tradition has, for centuries, been infix expressions. This has carried through to programming, that has also significantly converged on C-like syntax for its dominant languages. Lisp stands against both of these traditions in its choice of prefix expressions written in a simpler grammar than the norm. There are costs to this choice, and these costs tend to be immediately obvious. There are also benefits, and these benefits take time to make themselves known. If you have that time, it can be a a rewarding thing to explore the possibilities, even if you never get the chance to use them directly in production.

January 20, 2023

As a child of the 80's, I had a front row seat to the beginning of what was then called personal computing. My elementary school got its first Apple around the time I entered kindergarten. That was also the time personal computers were starting to make inroads into offices (largely thanks to VisiCalc and Lotus 1-2-3). By modern standards these machines weren't very good. At the time they were transformative. They brought computing to places it hadn't been before, and gave access to entirely new sets of people. For someone with an early adopter's mindset, it an optimistic and exploratory time. It's for this reason (and the fact it was my childhood) that I like looking back on these old machines. That's something I hope to do here in an informal series of posts. If there happen to be a few lessons for modern computing along the way, so much the better.

If you're reading this, you're probably familar with retrocomputing. It's easy to go to eBay, buy some used equipment, and play around with a period machine from the early 80's. Emulators make it even easier. As much as I appreciate the movement, it doesn't quite provide the full experience of the time. To put it in perspective, an Apple //e was a $4,000 purchase in today's money. This is before adding disk drives, software, or a monitor. After bringing it home, and turning it on, all you had was a black screen and a blinking prompt from Applesoft basic. If you needed help, you were limited to the manual, a few books and magazines at the local bookstore, and whoever else you happened to know. The costs were high, the utility wasn't obvious, and there wasn't a huge network of people to fall back on for help. It was a different time in a way retrocomputing doesn't quite capture.

My goal here is to talk about my own experiences in that time. What it was like to grow up with these machines, both in school and at home. It's one person's perspective (from a position of privlidge) but hopefully it'll capture a little of the spirit of the day.

If you want a way to apply this to modern computing, I'd suggest thinking about the ways it was possible for these machines to be useful with such limited capabilities. I'm typing this on a laptop with a quarter million times the memory of an Apple //e. It's arguably suprising that the Apple was useful at all. But it was, and without much of the software and hardware we take for granted today. This suggests that we might have more ways to produce useful software systems than we think. Do you really need to take on the complexities of Kubernetes or React to meet your requirements? Maybe it's possible to bring a little of the minimalist spirit of 1983 forward, take advantage of the fact modern computers are as good as they are, and deliver more value for less cost.

Before I continue, I should start off by acknowledging just how privileged I am to be able to write these stories. I grew up in a stable family with enough extra resources to be able to devote a significant chunk of money to home computing. My dad is an engineer by training, with experience in computing dating back to the 60's. He was able to apply computing at his job in a number of capacities, and then had the desire and ability to bring it home. To support this, his employer offered financing programs to help employees buy their own home machines. For my mom's part, she taught third grade at my elementary school, which in 1983 (when I was in third grade) happened to be piloting a Logo programming course for third graders. Not only was I part of the course, my mom helped run the lab, and I often had free run after school to explore on my own. (At least one summer, when I was ten or eleven, I was responsible for setting up all the hardware in the lab for the upcoming school year.)

I didn't always see it at the time, but this was an amazingly uncommon set of circumstances. It literally set the direction of my intellectual and professional life, and is something I will always be thankful for. I am thankful to my parents, and also to the good fortune of the circumstances which enabled it to happen for us as a family. It could have been very different, and for most people, it was.

But before most of that, one of the first personal computers I was ever exposed to was my Uncle Herman's Timex Sinclair 1000. This was a Z80 machine, built in Clive Sinclair fashion - to the lowest possible price point. It was intended to be a machine for beginning hobbyists, and sold for $100. (In modern dollars, that's roughtly the same as a low end iPad.) Uncle Herman had his TS1000 connected to a black and white TV and sitting on his kitchen table. It's the first and only time I've ever computed on an embroidered tablecloth.

The machine itself, as you might guess from the price, was dominated by it's limitations. The first was memory. A stock 1000 had a total of 2KB of memory. KB. Not GB. Not MB. KB.

The second limitation of the machine was the keyboard. To save on cost, the keyboard was entirely membrane based. The keys were drawn on a sheet of flat plastic, didn't move when you pressed them, and offered no tactile feedback at all. The closest modern experience is the butterfly keyboard, for which Apple was recently sued and lost.

Fortunately for the machine, the software design had a trick up its sleeve that addressed both limitations at the same time. Like many other machines of the time, the 1000's only user interface was through a BASIC interpreter. When you plug the computer in (there was no power switch) you're immediately dropped into a REPL for a BASIC interpeter that serves as the command line interface. However, due to the memory limitations, the 1000 lacked space for a line editor. There wasn't enough memory in the machine to commit the bytes necessary to buffer a line of text character by character, before parsing it to a more memory efficient tokenized representation.

The solution to this problem was to allow users to enter BASIC code directly in tokenized form, without the need to parse text. Rather than typing the five characters PRINT and having the interpreter translate that to a one byte token code, the user directly pressed a button labeled PRINT. The code for the PRINT button then emitted the one byte code for that operation. This bypassed the need for both the string buffer and the parse/tokenize step.

Beyond the reduced memory consumption of this approach, it also meant you say PRINT with one keypress instead of five. This is good, given the lousy keyboard. There are also discoverability benefits. With each BASIC command labeled directly on the keyboard, it was easy for the beginner to see a list of the possible commands. The downside is that the number of possible operations is limited by the number of keys and shift states. (A problem shared by programmable pocket calculators of the time.)

Of course, the machine had other limitations too. Graphics were blocky and monochrome, and a lack of hardware forced a hard tradeoff between CPU and display refresh. It's easy to forget this now, but driving a display is a demanding task. Displays require continual refresh, with every pixel has to be driven every frame. If this doesn't happen the display goes blank. The 1000 was so down on hardware capacity that it forced a choice on the programmer. There were two modes for controlling the tradeoff between display refresh and execution speed. FAST mode gave faster execution of user code, at the expense of sacrificing display refresh. Run your code and the display goes blank. If you wanted simultaneous execution and display, you had to select SLOW mode and pay the performance price of multiplexing too little hardware to do too much work.

Despite these limitations, the machine did offer a few options for expansion. The motherboard exposed an edge connector on the back of the case. There were enough pins on this connector for a memory expansion module to hang off the back of the machine. 2K was easy to exhaust, so an extra 16K was a nice addition. The issue here is that the connection between the computer and the expansion module was unreliable. The module could rock back and forth as you typed and the machine would occasionlly totally fail when the CPU lost its electrical connection to the expansion memory.

The usual mitigation strategy for an unreliable machine is to save your work often. This is a good idea in general, and even more advisable when pressing any given key key might disconnect your CPU from its memory and totally crash the machine. The difficulty here is that the Timex only had an analog cassette tape interface for storage. I never did get this to work, but it provided at least theoretical persistant storage for your programs. The idea here is that the computer would encode a data stream as an analog signal that could be recorded on audio tape. Later, the signal could be played back from the tape to the computer to reconstruct the data in memory.

This is not the best example of an old computer with a lot of utilty. In fact, the closet analog to a Timex Sinclair 1000 might not have been a computer at all. Between the keystroke programming, limited memory, and flashing display, the 1000 was arguably closest in scope to a programmable pocket calculator. Even with those limitations, if you had a 1000, you had machine you could use to learn programming. It was possible to get a taste of what personal computing was about, and decide about taking the next step.

March 5, 2022

Over most of the ten years I've been using git, I've been a strong proponent of merging over rebasing. It seemed more honest to avoid rewriting commits and more likely to produce a complete history. There are also problems that arise when you rewrite shared history, and you can avoid those entirely if you just never rewrite history at all. While all of this is true, the hidden costs of the approach came to play an increasing role in my thinking, and these days, I essentially avoid merge entirely. The result has been an easier workflow, with a more useful history of more coherent commits.

History tracking in a tool like git serves a few development purposes, some tactical and some strategic. Tactically speaking, it's nice to be able to have confidence that you can always reset to a particular state of the codebase, no matter how badly you've screwed it up. It's easier to make "risky" changes to code when you know that you're a split second away from your last known good state. Further, git remotes give you easy access to a form of off site backup and tags give you the ability to label released. Not only does the history in a tool like git make it easier to get to your last known good state during development, it also makes it easier to get back to the version you released last month before your dog destroyed your laptop.

At a strategic level, history tracking can give other longer term benefits. With a little effort, it's an excellent way to document the how and way your code evolves over time. Correctly done (and with an IDE), a good version history gives developers immediate access to the origin of each line of code, along with an explanation of how and why it got there. Of course, it takes effort to get there. Your history can easily devolve into a bunch of "WIP" messages in a randomly associated stream of commits. Like everything else in life worth doing, it takes effort to ensure that you actually have a commit history that can live up to its strategic value.

This starts with a commit history that people bother to read, and like everthing else, it takes effort to produce something worth reading. For people to bother reading your commit history, they need to believe that it's worth the time spent to do so. For that to happen, enough effort needs to have been spent assembling the history that it's possible to understand what's being said. This is where the notion of a complete history runs into trouble. Just like historians curate facts into readable narratives, it is our responsibility as developers to take some time to curate our projects' change history. At least if we expect them to be read. My argument for rebasing over merging boils down to the fact that rebase/squash makes it easier to do this curation and produce a history that has these useful properties.

For a commit to be useful in the future as a point of documentation, it needs to contain a coherent unit of work. git thinks in terms of commits, so it's important that you also think in terms of commits. Being able to trust that a single commit contains a complete single change is usetul both from the point of view of interpreting a history, and also from the point of view of using git to manipulate the history. It's easier to cherry-pick one commit with a useful change than it is three commits, each with a part of that one change.

Another way of putting this is that nobody cares about the history of how you developed a given feature. Imagine adding a field to a screen. You make a back end change in one commit, a front end change in the next, and then submit them both in one branch as a PR. A year after, does it really matter to you or to anybody else that you modified the back end first and then the front end? The two commits are just noise in the history. They document a state that never existed in anything like a production environment.

These two commits also introduce a certain degree of ongoing risk. Maybe you're trying to backport the added field into an earlier maintenance release of your software. What happens if you cherry-pick just one of the two commits into the maintenance release? Most likely, that results in a wholly invalid state that you may or may not detect in testing. Sure, the two commits honestly documented the history, but there's a cost. You lose documentation of the fact that both the front and back end changes are necessary parts of a single whole.

Given this argument for squashing, or curating, commits into useful atomic units, development branches largely reduce down to single commits. You may have a sequence of commits during development to personally track your work, but by the time you merge, you've squashed it down to one atomic commit describing one useful change. This simplifies your history directly, but it also makes it easier to rebase your evelopment branch. Rebasing a branch with a single commit avoids introducing historical states that "never existed". The single commit also dramatically simplifies the process of merge conflict resolution. Rebase a branch with 10 commits, and you may have 10 sets of merge conflicts to resolve. Do you really care about the first nine? Will you really go back to those commits and verify that they still work post-rebase? If you don't, you're just dumping garbage in your commit log that might not even compile, much less run.

I'll close with the thought that this approach also lends itself to better commit messages. If there are fewer commits, there are fewer commit messages to write. With fewer commit messages to write, you can take more time on each to write something useful. It's also easier to write commit messages when your commits are self-contained atomic units. Squashing and curating commits is useful by itself in that it leads to a cleaner history, but it also leads to more opportunities to produce good and useful commit messages. It points in the direction of a virtuous cycle where positive changes drive other positive changes.

January 14, 2022

This image has been circulating on LinkedIn as a tongue and cheek example of a miminum viable product.

Of course, at least one of the responses was that it's not an MVP without some extras. It needs 24/7 monitoring or a video camera with a motion alarm. It needs to detect quakes that occur off hours or when you're otherwise away from the detector. The trouble with this statement is the same as with the initial claimed MVP status of this design - both claims make assumptions about requirements. The initial claim assumes you're okay missing quakes when you're not around and the second assumes you really do need to know. To identify an MVP, you need to understand what it means to be viable. You need to understand the goals and requirements of your stakeholders and user community.

Personally, I'm sympathetic to the initial claim that two googly eyes stuck on a shet construction paper might actually be a viable earthquake detector. As a Texan transplant to the Northeast, I'd never experienced an earthquake until the 2011 Virginia earthquake rattled the walls of my suburban Philly office. Not having any real idea what was going on, my co-workers and I walked over to a wall of windows to figure it out. Nothing bad happened, but it wasn't a smart move, and exactly the sort of thing a wall mounted earthquake detector might have helped avoid. The product doesn't do much, but it does do something, and that might well be enough that it's viable.

This viability, though, is contingent on the fact that there was no need to know about earthquakes that occurred off-hours. Add that requirement in, and more capability is needed. The power of the MVP is that it forces you to develop a better understanding of what it is that you're trying to accomplish. Getting to an MVP is less about the product and more about the requirements that drive the creation of that product.

In a field like technology, where practicioners are often attracted to the technology itself, the distinction between what is truly required and what is not can be easy to miss. Personally, I came into this field because I like building things. It's fun and rewarding to turn an idea into a working system. The trouble with the MVP from this point of view is that defining a truly minimum product may easily eliminate the need to build something cool. The answer may well be that nNo, you don't get to build the video detection system, because you don't need it and your time is better spent elsewhere. The notion of the MVP inherently pulls you away from the act the build and forces you to to consider that there may be no immediate value in the thing you aim to build.

One of my first consulting engagments was years ago, for a bank building out a power trading system. They wanted to enter the business to hedge other trades, and the lack of a trading system to enforce controls limits was the reason they couldn't. Contrary to the advice of my team's leadership, they initially decided to scratch build a trading system in Java. There were two parts of this experience that spoke to the idea of understanding requirements and the scope of the minimum viable product.

The first case can be boiled down to the phrase 'training issue'. Coming from a background of packaged software development, my instincts at the time were competely aligned around building software that helps avoid user error. In mass market software, you can't train all of your users, so the software has to fill the gap. There's a higher standard for viability in that the software is required to do more and be more reliable.

This trading platform was different in that it was in-house software with a user base known that numbered in the dozens. With a user base that well and known small, it's feasable to just train everybody to avoid bugs. A crashing, high severity bug that might block a mass market software release might just be addressed by training users to avoid it. This can be much faster, which is important when the software schedule is blocking the business from operating in the first place. The software fix might not actually be required for the product to be viable. This was less perfect software, and more about getting to minimum viability and getting out of the way of a business that needed to run.

The second part of the story is that most of the way through the first phase of the build, the client dropped the custom build entirely. Instead, they'd deploy a commercial trading platform with some light customizations. There was a lot less to build, but it went live much more quickly, particularly in the more complex subsequent phases of the work. It turned out that none of the detailed customizations enabled by the custom build were actually required.

Note that this is not fundementally a negative message. What the MVP lets you do is lower the cost of your build by focusing on what is truly required. In the case of a trading organization, it can get your traders doing their job more quickly. In the case of an earthquake detector, maybe it means you can afford more than just one. Lowering the cost of your product can enable it to be used sooner and in more ways than otherwise.

The concept of an MVP has power because it focuses your attention on the actual requirements you're trying to meet. With that clearer focus, you can achieve lower costs by reducing your scope. This in turn implies you can afford to do more of real value with the limited resources you have available. It's not as much about doing less, as it is about doingo more of value with the resources you have at hand. That's a powerful thing, and something to keep in mind as you decide what you really must build.

Older Articles...