Armed with a text editor

mu's views on program and recipe! design

December 2005

Developing Mutagen (5): Unit Tests Posted 2005.12.08 21:58 PST

I finished describing the important capabilities, and the design decisions I made for Mutagen to support them cleanly last time. Now I'm going to talk about the other half. Literally.

A wonderful little program that makes me feel terribly underpaid for all my python software, SLOCCount, pits Mutagen's ID3 support at 1042 lines, plus 58 lines to be shared with non-ID3 formats. The associated unit tests are 1069 lines.

I didn't quite follow a regiment of test-first-programming, or test-driven-development, but I came pretty close. I'll tell you one thing: it feels really good. There have been a couple times where I've made large implementation changes, and due to the body of tests I felt secure doing so. One such change was a switch from manual seek/read/write code, to mmap.move code. Then when there were repercussions on 64-bit platforms, I brought back the manual code as a fallback and was able to quickly extend the tests to fake the failure in order to test the fallback. Rather than spend two more weeks in some form of beta, I was immediately confident enough to release my changes.

Metaprogramming unit tests

Enough bragging about unit tests; it's time to introduce one of my favorite testing patterns. In testing there are bound to be batches of very similar tests that need to be run on different sets of data. With something like Mutagen the prime example is frames. Each frame needs to be tested to guarantee that a change won't silently break it. Its operations include the three I've previously covered: reading from the bytestream, writing back to the bytestream, and the frame == eval(repr(frame)) circle.

I could write a series of test cases with very similar looking tests. Do something like:

def test_TXXX_repr(self):
    frame = TXXX.fromData(..., data)
    frame2 = eval(repr(frame))
    self.assertEquals(frame, frame2)

Repeat this 100 times for 100 supported tags, and then go on to the read-write tests, and do something like the following another hundred times:

def test_TXXX_write(self):
    frame = TXXX.fromData(..., data)
    data2 = frame._writeData()
    self.assertEquals(data, data2)

Eeeeewwww. I just finished showing how I avoided this kind of mess in the implementation of Mutagen, I'm not about to create a new mess here. What's the next incremental step? Data-driven tests. Collect all the data in one place, and test it all.

framedata = [
    (TXXX, txxx_bytestream_here),
    (TPE3, tpe3_bytestream_here),
    ...
]

def test_frames_repr(self):
    for fid, data in framedata:
        frame = fid.fromData(..., data)
        frame2 = eval(repr(frame))
        self.assertEquals(frame, frame2)

def test_frames_write(self):
    for fid, data in framedata:
        frame = fid.fromData(..., data)
        data2 = frame._writeData()
        self.assertEquals(data, data2)

That's definitely an improvement, right? Just plop all frames in the framedata list and they're all tested. It does have two fatal flaws. First it exits early on any failure, and only tests the frames that come before the first failing frame. Second there is no indication of what frame it was testing when it fails. Could I print out the frame ID before each assertEquals? Sure. Do I want to? Absolutely not. Clean test output is essential to catching failing tests; and since the tests should pass more often than they fail, it's the wrong effort in the wrong place.

Python's dynamic support for metaprogramming comes to the rescue. A test case is any class derived from unittest.TestCase; a test is just a method on that class whose name begins with "test". Python will let me create a class on the fly out of little more than a dictionary. If that class happens to derive from TestCase, it's good to go. So I keep the list of framedata from above, and instead create some dictionaries to turn into classes.

framedata = [ ... ]
repr_tests = {}
write_tests = {}

# build dictionaries filled with methods ready to become tests
for fid, data in framedata:

    def repr_test(self, fid=fid, data=data):
        frame = fid.fromData(..., data)
        frame2 = eval(repr(frame))
        self.assertEquals(frame, frame2)
    repr_tests["test_%s_repr" % fid.__name__] = repr_test

    def write_test(self, fid=fid, data=data):
        frame = fid.fromData(..., data)
        data2 = frame._writeData()
        self.assertEquals(data, data2)
    write_tests["test_%s_write" % fid.__name__] = write_test

# turn the dictionaries into classes
# class foo(bar): def baz():... <-> foo = type('foo', (bar,) {'baz':...})
testcase = type('TestRepr', (unittest.TestCase,), repr_tests)
testcases.append(testcase)
testcase = type('TestWrite', (unittest.TestCase,), write_tests)
testcases.append(testcase)

Steps to success

All the problems I mentioned above are solved. Each test runs independently so all frames are tested and all failures are reported. In each failure, the name of the tag and the kind of the operation are part of the failure output so it's obvious which test failed. There are just a few things to keep in mind.

Scopes and closures in Python don't always work like you expect. In the above code, I use def test(self, fid=fid, data=data) for a very specific reason: if I don't, the binding is purely dynamic. From the scope of the function, Python searches out until a name binding of fid (or data) is used. By the time these functions run, the names have one binding at best, which is the last entry in the framedata list. I'd have 100 tests gleefully testing one frame if I wrote it that way. By listing it in the default arguments, the value is captured at function creation time, each of the 100 tests will test a different frame, and I am happy.

Know your framework. The framework within which I run my tests will execute the tests in the testcases list. If you use unittest.main() instead, it needs to be able to find them via introspection. Name them individually in the module's global namespace (TestRepr = type(...); TestWrite = type(...)), and it should work.

And finally, don't confuse the number of tests, or lines of code that are tests, with the coverage or quality of the tests. Don't get hung up on making everything meta either. While Mutagen has a sanity test for every frame (and a meta-test that tests if each frame is tested!), that's only a quarter of its test code; most of the remaining tests are standard simple tests. The added cognitive complexity of meta-testing is not worth it for the remaining tests, but boy does it work well for frames!

Any questions?

With this I think my series on developing Mutagen is complete. If you have any questions, or there are areas you'd like to read more about, please ask!

(0 Comments ) (0 Trackbacks) mutagen