Mike Schaeffer's Blog

Articles with tag: tech
September 20, 2005

This may be something of a suprise, but Excel has even gotten the attention of Microsoft Research. Simon Peyton Jones, Margaret Burnett, and Alan Blackwell have written a paper that describes "extensions to the Excel spreadsheet that integrate user-defined functions into the spreadsheet grid". Of course, Excel doesn't do this... but, I wonder if that should be "Excel doesn't do this yet".

As a sidenote, this reminds me a little of how LabView handled subfunction definitions: subfunctions are defined using the same visual tools as top-level functions. It worked, but 'felt' a little heavy weight in actual use.

September 7, 2005

One of the first functions I like to write when creating a new data structure is a human-readable dumper. This is a simple function that takes the data you're working with and dumps it to an output stream in a readable way. I've found that these things can save huge amounts of debugging time: rather than paging through debugger watch windows, you can assess your program's state by calling a function and reading it out by eye.

A few tips for dump functions:

  • The more use this kind of scaffolding code gets, it gets progressively more cost effective to write. Time spent before dumpers are in place reduces the amount of use they can get and makes them progressively less cost effective. Implement them early, if you can.
  • Look for cheap alternatives already in your toolkit: Lisp can already print most of its structures, and .Net includes object serialization to XML. The standard solution might not be perfect, but it is 'free'.
  • Make sure your dumpers are correct from the outset. The whole point of this is to save debugging time later on, if you can't trust your view into your data structures during debugging, it will cost you time.
  • Dump into standard formats. If you can, dump into something like CSV, XML, S-expressions, or Dotty. If you have a considerable amount of data to analyze, this'll make it easier to use other tools to do some of the work.
  • Maintain your dumpers. Your software isn't going to go away, and neither are your data structures. If it's useful during initial development, it's likely to be useful during maintenance.
  • For structures that might be shared, or exist on the system heap, printing object addresses and reference counts can be very useful.
  • For big structures, it can be useful to elide specific content. For example: a list of 1000 items can be printed as (item0, item1, item2, ..., item999 ).
  • This stuff works for disk files too. For binary save formats, specific tooling to examine files can save time compared to an on-disk hex-editor/viewer. (Since you have code to read your disk format into a data structure in memory, if you also have code to dump your in-memory structure, this does not have to be much more work. Sharing code between the dump utility and the actual application also makes it more likely the dumper will show you the same view your application will see.)
  • Reading dumped structures back in can also be useful.
September 7, 2005

I've found a couple interesting websites related to computer history. The first is Dusty Decks, a blog related to some efforts to reconstruct Lisp and FORTRAN history. A highlight of this is a discussion on the Birth of the FORTRAN subroutine. Also via Dusty Decks is a website on the early history of the Lisp Programming Language.

That leads me to a couple books I've been reading lately. The first is Lisp in Small Pieces, by Christian Queinnec. I'm only a couple chapters in (stuck on continuations right now), but it's already been pretty profound. So far, the aspect of the book that's been the most useful is that it has gone through several core design choices Lisp implementors have to make ( Lisp-1 vs. Lisp-2, Lexical Scope vs. Dynamic Scope, types of continuations to support), and goes into depth regarding the implications and history of the choices involved. I think I'm finally starting to understand more of the significance of funcall and function in Common Lisp, not to mention throw/catch and block/return-from.

Book two is The First Computers–History and Architectures, edited by Raul Rojas. This book is a collection of papers discussing the architecture of significant early computers from the late 30's and 40's. The thing that's so unique about the book is that it focuses on the architectural issues surrounding these machines: the kinds of hardware they were built with, how they processed information, and how they were programmed. Just as an example, it has a detailed description of many of ENIAC's functional units, even going into descriptions of how problems were set up on the machine. Another highlight of the book for me (so far) has been a description of Konrad Zuse's relay-based Z3, down to the level of a system architectural diagram, schematics of a few key circuits, and coverage of its microprogramming (!).

August 24, 2005

I literally dreamed about this last night. It would be wonderful if Excel supported formulas like this:

=LET(value=MATCH(item,range,0), IF(ISERROR(value), 0, value))

If you're into Lisp-y languages, it'd look like this:

(let ((value (match item range 0)))
  (if (is-error? value) 0 value))

The function call =LET(name=binding, expression) would create a local range name named name, bound (equal) to the value returned by binding, to be used during the evaluation of expression. In the example above, during the evaluation of IF(ISERROR(value), 0, value))<, value would be bound to the value returned by MATCH(item, range, 0).

It's worth pointing out that this is slightly different from how normal Excel range names work. Range names in Excel work through textual substitution. With textual substitution, the initial expression would be logically equivalent to this:

=IF(ISERROR(MATCH(item, range, 0)), 0, MATCH(item, range, 0)))

In other words, Excel would treat every instance of value as if MATCH(item, range, 0) was explictly spelled out. This means there are two calls to MATCH and two potential searches through the range. While it's possible that Excel optimizes the second search away, I'm not sure that anybody outside of Microsoft can know for sure how this is handled.

Microsoft's current reccomendation for handling the specific ISERROR scenario in the first expression is this VBA function:

Function IfError(formula As Variant, show As String)

    On Error GoTo ErrorHandler

    If IsError(formula) Then
        IfError = show
    Else
        IfError = formula
    End If

    Exit Function

ErrorHandler:
    Resume Next

End Function

This isn't bad, but it requires that spreadsheet authors and readers understand VBA. It also imposes significant performance costs: calling into VBA from a worksheet takes time.

Older Articles...