Mike Schaeffer's Blog

Articles with tag: tech

February 11, 2008

Renaming historical users (svn:author's) in SVN repositories

I've been keeping track of the vCalc source code in an SVN repository since May of 2005. While I'm the only person who has ever committed code into the repository, I've developed vCalc on three or four machines, with different usernames on each machine. Since SVN records usernames with each commit, these historical usernames show up in each svn log or svn blame. svn blame is particularly bad because it displays a code listing with the username prepended to each line in a fixed width gutter. With some usernames longer than others, usernames that are very long can exceed the width of the gutter and push the code over to the right. Fortunately, changing historical usernames isn't that hard, if you have administrator rights on your SVN repository.

SVN stores the name of a revision's committer in a revision property named svn:author. If you're not familar with the term, a revision property is a blob of out of band data that SVN attaches to the revision. In addition to the author of a commit, they're also used to store the commit log message, and, via SVN's propset and propget commands, user-provided custom metadata for the revision. Changing the name of a user associated with a commit basically amounts to using propset to update the svn:author property for a revision. The command to do this is structured like so:

svn propset svn:author –revprop -rrev-number new-username repository-URL

If this works, you are all set, but what is more likely to happen is the following error:

svn: Repository has not been enabled to accept revision propchanges; ask the administrator to create a pre-revprop-change hook

By default, revision property changes are disabled. This makes sense if you are at all interested in using your source code control system to satisfy an auditing requirement. Changing the author of a commit would be a great way for a developer to cover their tracks, if they were interested in doing something underhanded. Also, unlike most other aspects of a project managed in SVN, revision properties have no change tracking: They are the change tracking mechanism for everything else. Because of the security risks, enabling changes to revision properties requires establishment of a guard hook: an external procedure that is consulted whenever someone requests that a revision property be changed. Any policy decisions about who can change what revision property when are implemented in the hook procedure.

Hooks in SVN are stored in the hooks/ directory under the repository toplevel. Conveniently, SVN provides a sample implementation on the hook we need to implement in the shell script pre-revprop-change.tmpl, but the sample implementation also has strict defaults about what can be changed, allowing only the log message to be set:

if [ "$ACTION" = "M" -a "$PROPNAME" = "svn:log" ]; then exit 0; fi

echo "Changing revision properties other than svn:log is prohibited" > &2
exit 1

The sample script can be enabled by renaming it to pre-revprop-change. It can be made considerably more lax by adding an exit 0 before the lines I list above. At this point, the property update command should work, although if you're at all interested in the security of your repository, it is best to restore whatever revision property policy was in place as soon as possible.

Tags:programmingtech

February 1, 2008

Excel, CR/LF, and CSV

I've spent a fair amount of time lately working with code that generates Comma Seperated Value files for loading into Excel. You'd think the format would be trivial, but not quite. One additional subtlety, one not covered in that 'specification', is Excel's inconsistent handling of end of line markers. As it turns out, if Excel loads a CSV file that contains a quoted, multi-line value, it expects a different line feed convention within the quoted value than the usual CR/LF. A CR embedded in a quoted field renders as a box, rather than as part of a newline. To suppress the box, CSV files for Excel need to be written with a LF-only convention within quoted values. Even then, Excel will not automatically expand rows containing a multi-line value. That has to be done manually.

Internally, Excel seems to follow the same LF-only convention that this issue with CSV files seems to imply. Taking the CODE(...) of each character in a manually entered multi-line cell value, shows only one charater, a LF, at each line break. My guess is that the quotes in a CSV file just act as a signal to turn off all special character handling, not just handling that signals new rows and cells. Either way, it's more than a little irritating that Excel compatible CSV files with multi-line values have to have two seperate end of line conventions.

Tags:exceltech

January 21, 2008

Still not tested... still not working... sort of...

Another one along the lines of My last post. I tried to compile this source file today, using the compiler in my little Lisp:

(define (values . args) (%panic "roh roh"))

(define (test x) (+ x 1))

I got the following result:

d:\test>vcsh -c test.scm
;;;; VCSH, Debug Build (SCAN 0.99 - Dec 17 2007 16:47:30)

; Info: Loading Internal File: fasl-compiler
; Info: Package 'fasl-compiler' created
; Info: Loading Internal File: fasl-write
; Info: Package 'fasl-write' created
; Info: Loading Internal File: fasl-compiler-run
; Info: Package 'fasl-compiler-run' created
; Info: stack limit disabled!
Fatal Error: roh roh @ (error.cpp:168)

Needless to say, fatal errors still aren't any good. However, this one is a bit more interesting than a simple type checking problem. The function %panic is the internal function used to signal fatal errors from Lisp code. The first definition above redefines values, the function to return multiple return values, so that it always panics with a fatal error. This is the kind of thing that, if done in a running environment, would break things almost immediately.

But, the compiler is slightly different.... it isolates the program being compiled from the compiler itself. This is done to keep redefinitions that might break the currently running compiler from doing just that. Redefinitions by the compiled program are only supposed to be visible to the compiled program. Since the above program never itself invokes values, it should never hit the call to %panic... except that it does.

What's happening here lies in the processing of the second definition. The definition itself is transformed a couple times by macroexpansion, first to this:

(%define test (named-lambda test (x) (+ x 1)))

And then, basically, to this:

(%define test (%lambda ((name . test) (lambda-list x)) (x) (+ x 1)))

The second macroexpansion step is the step that looks for optional arguments, and the internal function that parses lambda lists for optional arguments returns three values using values. This invocation of values happens in the environment of the program being compiled, so it hits the new %panic-invoking definition and the whole show grinds to a halt. The 'easy' fix, ensuring that macro expansion is isolated from potentially harmful redefinitions, won't work. Macro expansion has to happen in the user environment, so that macros can see function definitions that they might rely upon.

I don't have a unit test for the user/compiler seperation logic, so I thought when I started this blog post I was going to say something like: 'look, something else fundamentally broken, and without a test case'. That's interesting, but if you need convincing to write unit tests, you're probably already lost. What I actually learned while researching this post is a bit more subtle: it's a fundamental problem, but it's more about the design than the code itself. While the design I have for user/compiler seperation seems to work most of the time, it's not adequate to solve this kind of problem. I'm not yet exactly sure what the solution is, but it won't necessarily involve a missing unit test.

Tags:programmingtech

January 20, 2008

Not tested? Then it doesn't work.

The other day, I had the following (abbreviated) dialog with my little Scheme interpreter:

scheme> (intern! 'xyzzy2 (find-package "keyword"))
; Fatal Error: Assertation Failed: STRINGP(pname) @ (oblist.cpp:451)
c:\vcalc>vcsh.exe

scheme> (intern! 12)
; Fatal Error: Assertation Failed: STRINGP(sym_name) @ (oblist.cpp:269)
c:\vcalc>

Needless to say, 'Fatal errors' aren't good things, and fatal errors in intern!, a core function, are even worse. Without going into too many details, the first call should be returning successfully, and the second should be throwing a runtime type check error. However, the implementation of intern! wasn't checking argument types and passing invalid arguments into lower layers of the interpreter's oblist (symbol table) code, which died with an assertation failure.

To put this in perspective, my implentation of intern! is about five years old, and something that I thought to be a fairly reliable piece of code. At the very least, I didn't think it was susceptable to something as simple as a type checking error that would crash the entire interpreter. Of course, when I looked at my test suite, there wasn't a set of tests for intern!. That might have something to do with it, don't you think?

Here are the morals I'm taking from this little story:

Do not assume something works, unless you have a complete test suite for it. (Even then be wary, because your test suite is probably not complete.)
Shoot for more than 60% code coverage on your test cases.
Don't write your own interpreter, because there are probably hundreds of other bugs just like this one. :-)

Tags:programmingtech