Mike Schaeffer's Blog

Articles with tag: excel

September 20, 2005

The world's most popular functional programming language...

This may be something of a suprise, but Excel has even gotten the attention of Microsoft Research. Simon Peyton Jones, Margaret Burnett, and Alan Blackwell have written a paper that describes "extensions to the Excel spreadsheet that integrate user-defined functions into the spreadsheet grid". Of course, Excel doesn't do this... but, I wonder if that should be "Excel doesn't do this yet".

As a sidenote, this reminds me a little of how LabView handled subfunction definitions: subfunctions are defined using the same visual tools as top-level functions. It worked, but 'felt' a little heavy weight in actual use.

Tags:exceltech

August 24, 2005

I had a dream...

I literally dreamed about this last night. It would be wonderful if Excel supported formulas like this:

=LET(value=MATCH(item,range,0), IF(ISERROR(value), 0, value))

If you're into Lisp-y languages, it'd look like this:

(let ((value (match item range 0)))
  (if (is-error? value) 0 value))

The function call =LET(name=binding, expression) would create a local range name named name, bound (equal) to the value returned by binding, to be used during the evaluation of expression. In the example above, during the evaluation of IF(ISERROR(value), 0, value))<, value would be bound to the value returned by MATCH(item, range, 0).

It's worth pointing out that this is slightly different from how normal Excel range names work. Range names in Excel work through textual substitution. With textual substitution, the initial expression would be logically equivalent to this:

=IF(ISERROR(MATCH(item, range, 0)), 0, MATCH(item, range, 0)))

In other words, Excel would treat every instance of value as if MATCH(item, range, 0) was explictly spelled out. This means there are two calls to MATCH and two potential searches through the range. While it's possible that Excel optimizes the second search away, I'm not sure that anybody outside of Microsoft can know for sure how this is handled.

Microsoft's current reccomendation for handling the specific ISERROR scenario in the first expression is this VBA function:

Function IfError(formula As Variant, show As String)

    On Error GoTo ErrorHandler

    If IsError(formula) Then
        IfError = show
    Else
        IfError = formula
    End If

    Exit Function

ErrorHandler:
    Resume Next

End Function

This isn't bad, but it requires that spreadsheet authors and readers understand VBA. It also imposes significant performance costs: calling into VBA from a worksheet takes time.

Tags:excellisptech

July 28, 2005

List filtering with Formulas in Microsoft Excel: How it's done

Now that I've written a little about why you might want to replace Excel AutoFilter, here's how to actually do it. To frame the discussion, there are two problems to solve:

Deciding which rows of the input set are part of the result set
Displaying the result set in a contiguous sequence of spreadsheet rows.

The first problem is easy: add another column alongside the input set with a formula that evaluates to TRUE if the row belongs in the result. This can be any valid Excel formula: it can include complex logic, it can depend on other cells containing control parameters. In my example spreadsheet, this formula is in column H, labeled In Query?:

The tricky bit of the formula-based filter is the second problem: displaying the result set in a contiguous range of rows with no gaps. Each cell that might display part of the result set has to figure out itself what part of the result set to display, if any, and pull the data from the input set. A simple MATCH or LOOKUP can't handle this, since MATCH or LOOKUP can't be told to return the second, third, or nth match. They return the first match, which isn't quite enough for what we're trying to do.

As it turns out, even though having the result set compute a mapping from the input set is quite hard, solving the reverse problem isn't too bad. Having the input set compute the mapping to the result set is easy. Here's how it works, by column:

Ord. - The row ordinal number of the row in the input set, starting with 1.
Result Ord. - This column starts at zero, in the row preceeding the first row of the result set, and increments by 1 for each row where In Query? is TRUE. For each row with In Query? of TRUE, this column is the row ordinal number of this row in the result set.... We are almost there.
Result Rows. - The input row ordinal of each row in the output set. This is done by using MATCH to find the first row for each number in the Result Ord..

Once the Result Rows. column has been calculated, populating the actual result set is just a matter of using INDEX. ISERROR can be called on cells in Result Rows. to identify rows that don't contain values. After all this is said and done, we have a spreadsheet range that contains only a result set, updates like every other range in Excel, and can be used in formulas like every other range. I have a sample spreadsheet that implements a lot of this here.

Tags:exceltech

July 18, 2005

PS: I think that AutoFilter is typical of Excel...

I think that the weaknesses of the Excel AutoFilter turn out to be pretty typical of Excel in general.

To me, the brilliance of the spreadsheet was that it took a data model that business people were familar with, the accountant's paper spreadsheet, and layered on automatic computation and reporting facilities in a natural way. There's something very intuitive about going to cell c1, entering =A1+B1, and then having C1 contain the sum of the other two cells, automatically updated as the source cells change. It just makes sense, and is at the very core of every software spreadsheet dating back to the first, VisiCalc.

For years, spreadsheets worked at making this model work better. Lotus 1-2-3 introduced something called natural recalculation order that made it easier to follow the logic of spreadsheet calculation. Somewhere along the way, spreadsheets started doing limited recalculation, where formulas that didn't change weren't recalculated (thus saving time). New intrinsic functions were added, and Excel made a huge stride when it added array formulas: individual formulas that can produce more than one result. The gateway to user defined functions written in VisualBASIC was another huge win.

The core strength of all of these ideas is that they rely on and extend the core concept of the software spreadsheet: the software tracks dependancies between cells and automatically recalculates the appropriate results as necessary. As powerful as that concept is, Microsoft lost the plot somewhere around Excel 4 or 5 and keeps sinking money and effort into features that don't fully participate:

Excel has two data filter features: neither one can automatically update a table as a part of recalculation.
PivotTables don't update when their source data updates either. (For SQL data sources, this is understandable, but not so much when the source data comes from Excel itself).
PivotTables produce tables with missing values (to improve the formatting), which makes them very difficult to query with spreadsheet lookup functions.
The historgram function (among others) of the Analysis ToolPak is a one-time thing: you use it, it generates a histogram, and that's it. It's not possible to incorporate histogram generation into the dataflow driven recalculation of a spreadsheet.
There's no way to use an Excel formula to determine if a row is excluded or included in an AutoFilter query. Actually, there's no way to have the result set of an AutoFilter query drive spreadsheet recalculation at all.

Maybe this is being picky, but spreadsheets have a real strength in that they made it a lot easier for non-techies to specify how a computer can automatically solve certain types of problems. It's just a shame that so many of Excel's features are excluded from the natural way Excel is programmed.

Tags:exceltech