Mike Schaeffer's Blog

March 1, 2005

Programming Well: Embrace Idempotence, Part 2 (It works at runtime too)

Idempotence has benefits at a program's run-time, as well as at build time. To illustrate, consider the case of a reference counted string. For the sake of example, it might be declared like this (In case you're wondering, no, I don't think this is a production-ready counted string library...):

struct CountedString
{
    int  _references;
    char *_data;
};

CountedString *makeString(char *data)
{
    CountedString cs = (CountedString *)malloc(sizeof(CountedString));

    cs->_references = 1;
    cs->_data = strdup(data);

    return 1;
}

CountedString *referToString(CountedString *cs)
{
    cs->_references++;
    return cs;
}

void doneWithString(CountedString *cs)
{
    cs->_references--;

    if (cs->_references == 0)
    {
        free(cs->_data);
        free(cs);
    }
}

// ... useful library functions go here...

The reference counting mechanism buys you two things. It gives you the ability to delete strings when they're no longer accessible; It also gives you the abilty to avoid string copies by deferring them to the last possible moment. This second benefit, known as copy-on-write, is where idempotence can play a role. What copy on write entails is ensuring that whenever you write to a resource, you ensure that you have a copy unique to to yourself. If the copy you have isn't unique, copy-on-write requires that you duplicate the resource and modify the copy instead of the original. If you never modify the string, you never make the copy.

This means that the beginning of every string function that alters a string has to look something like this:

CountedString *alterString(CountedString *cs)
{
    if (cs->_references > 1)
    {
        CountedString *uniqueString = makeString(cs->_data);
        doneWithString(cs);
        cs = uniqueString;
    }

    \\ ... now, cs can be modified at will

     return cs;
}

Apply a little refactoring, and you get this...

CountedString *ensureUniqueInstance(CountedString *cs)
{
    if (cs->_references > 1)
    {
        CountedString *uniqueString = makeString(cs->_data);
        doneWithString(cs);
        cs = uniqueString;
    }

    return cs;
}

CountedString *alterString(CountedString *cs)
{
    cs = ensureUniqueReference(cs);

    \\ ... now, cs can be modified at will

    return cs;
}

Of course, ensureUniqueInstance ends up being idempotent: it gets you into a known state from an unknown state, and it doesn't (semantically) matter if you call it too often. That's the key insight into why idempotence can be useful. Because idempotent processes don't rely on foreknowledge of your system's state to work reliably, they can be a predictable means to get into a known state. Also, If you hide idempotent processes behind the appropriate abstractions, they allow you to write code that's more self documenting. A function that begins with a line like cs = ensureUniqueInstance(cs); more clearly says to the reader that it needs a unique instance of cs than lines of code that check the reference count of cs and potentially duplicate it.

Next up are a few more examples of idempotence, as well as a look into some of the pitfalls.

February 27, 2005

Jef Raskin

Reading Slashdot today, I heard that Jef Raskin has passed away from cancer. If you don't know who Jef is, it's safe to say that you have been influenced by his ideas if you're reading this blog.

Dr. Raskin was one of the first human interface experts to contribute to and be involved in the Apple Macintosh. computer. While it's true that the design took a different direction from some of his initial ideas, he played a major role in defining the user interface ethic of the Macintosh, and consequently basically every other major computer interface.

After leaving Apple, Jef went on to continue his ideas with the SwyftCard and Canon Cat. The best articulation I've seen of his ideas regarding interface design is in his book, The Humane Interface. He has also put a great deal of his work on his personal website.

This is a sad day, indeed.

February 22, 2005

Larry Osterman on Concurrency

Larry Osterman has been running a nice series of posts on issues related to thread synchronization and concurrency related issues. It's been full of useful tips and tricks, I particularly like part 2, Avoiding the Issue. That's a technique that's worked well for me in the multithreaded systems I've worked on. Of course, if you're writing SQL Server, etc. I'm sure you can't take nearly as simple an approach.

February 22, 2005

Programming Well: Embrace Idempotence, Part 1

There's a good definition of the word idempotent over on Dictinoary.com. In a nutshell, the word is used to describe mathematical functions that satisfy the relationship f(x)=f(f(x)): functions for which repeated applications produce the same result as the first. For functions that satisfy this condition, you can rest assured that you can apply the function as many times as you like, get the expected result, and not screw anything up if you apply it more times than you absolutely need. This turns out to be a useful concept for people developing software systems.

One of the most common examples of this is in C-style include files. It's common practice to write code like this, to guard against multiple inclusions:

#ifndef __HEADER_FILE_GUARD
#define __HEADER_FILE_GUARD

// ... declarations go here...

#endif __HEADER_FILE_GUARD

This idiomatic C code protects the include file against multiple inclusions. Include files with this style of guard can be included as many times as you like with no ill effect.

The benefit to this is that it basically changes the meaning of the code #include <foo.h> from "Include these declarations" to "Ensure that these declarations have been made". That's a much safer kind of statement to make since it delgates the whole issue of multiple inclusions to a simple piece of automated logic.

Of course, this is pretty commonplace. More is to come...

Older Articles...