The last couple days I went through one of the more painful debugging experiences of my life. The symptom was that one of our OggFlac unit tests in Mutagen was failing. But not on x86; that would be too easy. Instead it was only failing on AMD64 systems, which is reminiscent of a bug I covered before. Thankfully Pete was able to provide direct network access to an AMD64 machine for me to test on; otherwise we'd undoubtedly still be fighting this bug. As it was, we spent at least four hours each on Thursday and Friday.
So knowing that this was a painful bug, how do we debug such symptoms?
First things first, try to isolate the problem as much as possible. The failing test did a small sequence of things, starting from a nearly blank file, adding some stuff, deleting it, adding other stuff, and then failing the test on line 39. So pare it down. As it turns out all we need are the three lines 37–39: add, save, test.
Now that we know which methods reveal the bug, we can look into the code behind them. This can help find many kinds of errors. I think it helped us tighten our code a bit. But after many hours of staring at it and trying various things it was still no help.
We had a passing version of the same test on x86. Was the problem in our code, or in the crosscheck we were performing? Taking the output produced on x86 and testing it with flac on x64 (and vice versa) showed it was a problem in our code, and not the tool. Drat. But hey, the file is the same up until that last step. That confirms everything else so far is good.
Since looking at the file gave us no good clues, other than the broken file in question looked like garbage, it was time to delve deeper. I coded up a tracing wrapper, which injects tracing primitives into python code. I invoked this from the OggFLAC tests, and traced ogg, oggvorbis, and oggflac modules under mutagen.
Note: the linked version is the final version. Earlier versions lacked tracefile, function return values, and indentation.
The only differences for the longest time, after filtering out inconsequential differences, were at the failure spots themselves. The passing code would continue; the failing code would report an exception. I would add more tracing. The same. What else can I trace? There has to be a difference somewhere. So I added tracefile.
And the size of the tracelog exploded. I spent time coping with that when I should have been realizing that there is no longer any difference. The good and bad trace were too different (because of our fallback code for the AMD64 mmap problems in python) so make them more similar. Somehow the failing code started working. What did I do? I added tracefile. Huh? Great...now I can't even observe my bug without it morphing.
In concurrent programming, or code with extreme speed needs, heisenbugs can be a real problem. Fortunately Mutagen is neither.
The thing about the final push is you never know when it will hit. When you first start debugging you're sure it's right around the corner. Then the more you work at it without solving it, the more convinced you become that something insane is wrong with your tools. So here's a reconstructed stream of consciousness from when I first saw the heisenbug.
Take all tracing out, but leave in tracefile. Okay, it's still working. Take out tracefile. Okay, it fails. Hey, it fails on x86 too. Hallelujah! Wait. Huh? Pare down tracefile. Use it and take out #read. It fails. Put #read back and take out #write. It works. Leave in #read but simplify it to just recurse the call. It fails. What's left?
I fiddle with it and find the only piece I need is self.tell() before calling self.read. So I mention my bewilderment to Joe and he finds a gem: this bug has been reported to Python before. And denied. Apparently rightfully so. Six month old tested code is to blame. I fix it up by using explicit seek calls, and everything passes.
There's always another dark corner of your toolchain left to be learned, often the hard way. In our case it was the fact that interleaved read and write operations on a file pointer, without explicitly resetting the stream position, are undefined. They work most of the time on the implementations we run on, but they fail sometimes. I feel we did an okay job of challenging and proving our assumptions right or wrong, but it was really hard to make the leap that let us trust the new barely tested code that was revealing the failure, and stop trusting the old tested code that was causing it. I'd like to do better about that in the future.
In the meantime, Mutagen 1.4, with corrected file handling code on 64-bit systems, here we come!
I finished describing the important capabilities, and the design decisions I made for Mutagen to support them cleanly last time. Now I'm going to talk about the other half. Literally.
A wonderful little program that makes me feel terribly underpaid for all my python software, SLOCCount, pits Mutagen's ID3 support at 1042 lines, plus 58 lines to be shared with non-ID3 formats. The associated unit tests are 1069 lines.
I didn't quite follow a regiment of test-first-programming, or test-driven-development, but I came pretty close. I'll tell you one thing: it feels really good. There have been a couple times where I've made large implementation changes, and due to the body of tests I felt secure doing so. One such change was a switch from manual seek/read/write code, to mmap.move code. Then when there were repercussions on 64-bit platforms, I brought back the manual code as a fallback and was able to quickly extend the tests to fake the failure in order to test the fallback. Rather than spend two more weeks in some form of beta, I was immediately confident enough to release my changes.
Enough bragging about unit tests; it's time to introduce one of my favorite testing patterns. In testing there are bound to be batches of very similar tests that need to be run on different sets of data. With something like Mutagen the prime example is frames. Each frame needs to be tested to guarantee that a change won't silently break it. Its operations include the three I've previously covered: reading from the bytestream, writing back to the bytestream, and the frame == eval(repr(frame)) circle.
I could write a series of test cases with very similar looking tests. Do something like:
def test_TXXX_repr(self): frame = TXXX.fromData(..., data) frame2 = eval(repr(frame)) self.assertEquals(frame, frame2)
Repeat this 100 times for 100 supported tags, and then go on to the read-write tests, and do something like the following another hundred times:
def test_TXXX_write(self): frame = TXXX.fromData(..., data) data2 = frame._writeData() self.assertEquals(data, data2)
Eeeeewwww. I just finished showing how I avoided this kind of mess in the implementation of Mutagen, I'm not about to create a new mess here. What's the next incremental step? Data-driven tests. Collect all the data in one place, and test it all.
framedata = [ (TXXX, txxx_bytestream_here), (TPE3, tpe3_bytestream_here), ... ] def test_frames_repr(self): for fid, data in framedata: frame = fid.fromData(..., data) frame2 = eval(repr(frame)) self.assertEquals(frame, frame2) def test_frames_write(self): for fid, data in framedata: frame = fid.fromData(..., data) data2 = frame._writeData() self.assertEquals(data, data2)
That's definitely an improvement, right? Just plop all frames in the framedata list and they're all tested. It does have two fatal flaws. First it exits early on any failure, and only tests the frames that come before the first failing frame. Second there is no indication of what frame it was testing when it fails. Could I print out the frame ID before each assertEquals? Sure. Do I want to? Absolutely not. Clean test output is essential to catching failing tests; and since the tests should pass more often than they fail, it's the wrong effort in the wrong place.
Python's dynamic support for metaprogramming comes to the rescue. A test case is any class derived from unittest.TestCase; a test is just a method on that class whose name begins with "test". Python will let me create a class on the fly out of little more than a dictionary. If that class happens to derive from TestCase, it's good to go. So I keep the list of framedata from above, and instead create some dictionaries to turn into classes.
framedata = [ ... ] repr_tests = {} write_tests = {} # build dictionaries filled with methods ready to become tests for fid, data in framedata: def repr_test(self, fid=fid, data=data): frame = fid.fromData(..., data) frame2 = eval(repr(frame)) self.assertEquals(frame, frame2) repr_tests["test_%s_repr" % fid.__name__] = repr_test def write_test(self, fid=fid, data=data): frame = fid.fromData(..., data) data2 = frame._writeData() self.assertEquals(data, data2) write_tests["test_%s_write" % fid.__name__] = write_test # turn the dictionaries into classes # class foo(bar): def baz():... <-> foo = type('foo', (bar,) {'baz':...}) testcase = type('TestRepr', (unittest.TestCase,), repr_tests) testcases.append(testcase) testcase = type('TestWrite', (unittest.TestCase,), write_tests) testcases.append(testcase)
All the problems I mentioned above are solved. Each test runs independently so all frames are tested and all failures are reported. In each failure, the name of the tag and the kind of the operation are part of the failure output so it's obvious which test failed. There are just a few things to keep in mind.
Scopes and closures in Python don't always work like you expect. In the above code, I use def test(self, fid=fid, data=data) for a very specific reason: if I don't, the binding is purely dynamic. From the scope of the function, Python searches out until a name binding of fid (or data) is used. By the time these functions run, the names have one binding at best, which is the last entry in the framedata list. I'd have 100 tests gleefully testing one frame if I wrote it that way. By listing it in the default arguments, the value is captured at function creation time, each of the 100 tests will test a different frame, and I am happy.
Know your framework. The framework within which I run my tests will execute the tests in the testcases list. If you use unittest.main() instead, it needs to be able to find them via introspection. Name them individually in the module's global namespace (TestRepr = type(...); TestWrite = type(...)), and it should work.
And finally, don't confuse the number of tests, or lines of code that are tests, with the coverage or quality of the tests. Don't get hung up on making everything meta either. While Mutagen has a sanity test for every frame (and a meta-test that tests if each frame is tested!), that's only a quarter of its test code; most of the remaining tests are standard simple tests. The added cognitive complexity of meta-testing is not worth it for the remaining tests, but boy does it work well for frames!
With this I think my series on developing Mutagen is complete. If you have any questions, or there are areas you'd like to read more about, please ask!
After covering the dual is-a has-a (or has-many) tree of Frame and Spec classes last time, I promised I'd share some of the other advantages of this data-driven metaprogramming structure. Here's where it gets really cool. With some slight trickery in a couple base-class functions, every last Frame class becomes very easy to use.
This entry is going to use a lot of code, because that's where the benefits are most apparent and most useful. To make sense of it, remember that each class has a member called _framespec which is a list of specs. Each spec has a read, write, and validate method, and stores its own preferred name.
Most useful python classes allow you to create an instance of a class based on a set of values. For example, the canonical text frame has an encoding and a list of strings. It would be really cool if I could construct a TXXX frame by passing it an encoding, a description, and a list of strings. So how can I enable this? There's a framespec defined already:
_framespec = [ EncodingSpec('encoding'), EncodedTextSpec('desc'), MultiSpec('text', EncodedTextSpec('text'), sep=u'\u0000') ]
I could just define an __init__ method that mirrors it like so:
def __init__(self, encoding=None, desc=None, text=None): self.encoding = encoding self.desc = desc self.text = text
This mostly does the trick, but it's really easy to pass invalid data and generate a nonsense frame. That's a shame because specs have a validate method, and it's easy, but ugly, to fix:
def __init__(self, encoding=None, desc=None, text=None): self.encoding = self._framespec[0].validate(self, encoding) self.desc = self._framespec[1].validate(self, desc) self.text = self._framespec[2].validate(self, text)
No, that's really ugly. Not only is there no useful semantic connection between 0 and encoding, 1 and desc, or 2 and text, but there's all this boilerplate self._framespec[n].validate() repeated for each spec. How to fix that? Go meta!
def __init__(self, encoding=None, desc=None, text=None): for spec in self._framespec: setattr(self, spec.name, spec.validate(self, locals()[spec.name]))
I've now removed the repeated code within the __init__ method, and partly fixed the ugliness. But this still needs to appear in each relevant frame class. I'd much rather write something a little harder just once, rather than something a little easier twenty times.
On top of that, using locals()[spec.name] smells funny to me. It's not bad, but there's something about it I don't like, perhaps how easy it would be to accidentally clash a name if I added code before this for loop. If I'm willing to trade away the function signature, I can use an explicit dictionary...
def __init__(self, **kwargs): for spec in self._framespec: validated = spec.validate(self, kwargs.get(spec.name, None)) setattr(self, spec.name, validated)
...and lose the ability to use positional parameters. That's not too bad. One more tweak and I can have positional parameters again: pull in a positional variadic and process it; then process the remaining specs as keyword arguments in the kwargs dictionary.
def __init__(self, *args, **kwargs): for spec, val in zip(self._framespec, args): setattr(self, spec.name, spec.validate(self, val)) for spec in self._framespec[len(args):]: validated = spec.validate(self, kwargs.get(spec.name, None)) setattr(self, spec.name, validated)
These last two versions fare very well as a base-class Frame.__init__ method. The latter is used almost verbatim in Mutagen. As an example of why this is a good factoring, consider the other task Mutagen's Frame.__init__ performs: if passed an existing frame as its sole argument, it copy-validates it into the new frame. If this code were all over the place, many different versions would have to be made. As is, a loop like the kwargs loop fetches values from the passed frame, validates and sets them.
Little is worse than having a simple data structure instance, but not being able to easily view its contents quickly and succinctly in the interactive interpreter. Hot on the heels of the constructor implementation, it should be almost obvious how we can construct a single unified __repr__ method. Best yet, by leveraging the constructor I just wrote, it's easy to follow recommendations and make a representation that eval()s to the same frame.
def __repr__(self): values = [] for attr in self._framespec: values.append('%s=%r' % (attr.name, getattr(self, attr.name))) return '%s(%s)' % (type(self).__name__, ', '.join(values))
There's one more touch that makes Mutagen's frames easy to use. While there is a lot of structure to the frame's data, most of the time client code only really cares about one part. In a text frame that's almost uniformly the list of string values. If there's only one value in that list, then the client code only cares about one string.
To facilitate this access simplification, Mutagen defines __str__ and __eq__, and then some list access methods which forward directly to its list. These include __getitem__, __iter__, append, and extend. The stringification function returns a single string composed of the list joined by NULL characters. Equality handles a comparison against another text frame, or against a string. The list methods make it easier to modify or refer to single strings in the list. Oddly Mutagen doesn't have a __setitem__; apparently we haven't needed it yet in any of our client code.
Unforunately I haven't seen a good way to specify this data less frequently than in in each frame class that specifies a framespec. Perhaps priority values could be given to spec classes, and this implementation could be generated with a true metaclass. So far I don't think that would be a winning trade over the straightforwardness of this code. While there will be repeated code, none of it is as full of boilerplate copy-mispaste chances as the initialization code.
That's the first half of Mutagen. I've described the interesting parts of the design now, and have just a few more tidbits about the implementation that I want to share. If you're wondering how I could only have a few tidbits left when there's an entire second half, then you'll have to wait for the next entry when I dive into the second half. The second half is arguably more important than the first, even though the first half does all the work. But that will wait. For now, just some tidbits.
The set of available frames changes significantly between ID3v2.2, ID3v2.3, and ID3v2.4. By and large 2.3 and 2.4 are drop-in compatible, and I won't go into the incompatibilities here. However 2.2 uses an entirely different data structure format, often using six bytes to 2.3 and 2.4's ten, and with three-character frame IDs instead of four. However the payload is almost the same.
Mutagen handles this by defining a series of subclasses of the final 2.3/2.4 frames. The user-defined text frame is class TXX(TXXX): pass; The starting key is class TKE(TKEY): pass; and so forth. The data reading code in Frame handles the headers, but then the payload is shared. Finally all 2.2 frames are "upgraded" to their 2.4 equivalents by executing:
if len(type(frame).__name__) == 3: # if it's a 2.2 frame, frame = type(frame).__base__(frame) # use the copy-validate from above!
The first versions of ID3, ID3v1 and 1.1 aren't going away any time soon. As I mentioned in the first of this series, they're extremely limited with only 128 bytes of storage including the header TAG, and thus can only encode a very small set of predefined set of values. Since I prefer rich and flexible metadata, I detest ID3v1. But many portable devices only understand ID3v1. Of those that understand some form of ID3v2, many do not understand the most recent. Despite my distaste for ID3v1, Mutagen cannot ignore it.
Mutagen implements this with a completely separate read and write loop, and translates to and from ID3v2.4 tags internally to shield the client code from too many differences. There are definite downsides to this if ID3v1 is important to you user, as frame value length limits are not enforced locally; instead the data is truncated to fit when it is written. Oh well; that's a small sacrifice for incorporating a legacy format into this modern library.
Remember: the second half of Mutagen awaits next entry.
Yesterday I described a lot of the complexities of the ID3v2 frame structures. By grouping frames into text frames, URL frames, and other frames, and then subgrouping appropriately, it's possible to avoid some code redundancy. But considering the number of remaining frames, it's really not enough. It would also be a royal pain to test all the copy-pasted implementations. So what can we do?
Mutagen solves this with data-driven metaprogramming on a _framespec. (Don't bother looking for this in the ID3 specifications; it's a Mutagen construct). This member stores a list of specifiers (specs). Each spec is an instance of a class derived from Spec. Although each spec knows only four things, the four things are enough to provide really rich functionality. It knows:
To tie this down let's look at how to implement a text frame class. First we'll examine the following framespec, which corresponds directly to the text frame structure of one byte specifying encoding followed by a payload of text data:
_framespec = [ EncodingSpec('encoding'), EncodedTextSpec('text'), ]
The EncodingSpec knows how to read a byte off the stream, and return its interpreted integer value. The EncodedTextSpec knows how to read the text data, decode it through the stored encoding, and return it. To load the value of a single frame, the base class Frame exposes a factory function: give it some information about the tag and the frame, and also pass it the payload data of the frame, and it will return a instance filled with the data from the payload bytestream. This factory function then iterates across all its framespecs, calls its read method, and stores its return value on the frame instance as a member data value with the spec's name.
Since the spec receives the frame whose data it is reading or writing, you might wonder why it doesn't store the value directly. The reason: flexibility. The framespec above isn't sufficient. While most text frames may have a single value, the ID3v2.4 spec allows for multiple strings separated by NULLs in most cases. In other scenarios it allows for multiple pairs of such strings. Rather than understand the NULLs in client code, or build list behavior into each relevant spec, the indirection allows us to put all the logic in one spec.
MultiSpec is a meta-spec. It doesn't know how to read from or write to a bytestream, though it does have a name. It is the one spec that stores multiple values as a list. When it's asked to read a bytestream, it iterates over its sub-specs calling their read methods as if it were a Frame, as many times as is necessary to exhaust the available data. So using MultiSpec, the real TextFrame._framespec is:
_framespec = [ EncodingSpec('encoding'), MultiSpec('text', EncodedTextSpec('text'), sep='\u0000'), ]
Now when SomeTextFrame.fromData is called to read a frame, the process will be as follows:
Writing a frame has symmetrical steps:
Doing things this way makes it really easy to add additional frames to Mutagen. If the frame structure is composed of already specified piece, just reuse the spec. If there's a new one, implement it as a new spec. List out the specs as they can appear in the frame, and it not only just works, it's practically documented. For instance the PairedTextFrames (TIPL and TMCL) have
_framespec = [ EncodingSpec('encoding'), MultiSpec('people', EncodedTextSpec('involvement'), EncodedTextSpec('person)) ]
TXXX has
_framespec = [ EncodingSpec('encoding'), EncodedTextSpec('desc'), MultiSpec('text', EncodedTextSpec('text'), sep=u'\u0000') ]
URL frames (other than WXXX which has an encoded text description) have just
_framespec = [ Latin1TextSpec('url') ]
Even APIC is almost understandable just by reading its framespec.
_framespec = [ EncodingSpec('encoding'), Latin1TextSpec('mime'), ByteSpec('type'), EncodedTextSpec('desc'), BinaryDataSpec('data') ]
Next entry I'll cover some more benefits of this structure; data driven metaprogramming is an exceptional fit for Mutagen.
Last entry I discussed a very high level view of how ID3v2.4 metadata is stored as a tag which is composed of several frames. Now I'm going to switch gears and discuss frames from the bottom up, and how it influenced Mutagen's design. You will probably want to refer to the ID3v2.4 frame list several times during this discussion.
All frames start with ten bytes of header with the following semantic, and are immediately followed by their payload:
FRAM : four bytes of the frame ID 00 : two bytes of binary flags 1234 : four bytes containing the size of the frame payload
The vast plurality of predefined frames are text frames. All of these text frames have payloads of the form:
But some have higher level semantic requirements. The frames TIPL and TMCL, for example, require pairs of strings listing functions or instruments paired with artist or person. Frames TRCK and TPOS are supposed to be numeric, and may be either one number or two separated by a slash. Frame TKEY is supposed to be one of A-G followed optionally by either b or # and then optionally by m, all in order to indicate A through G sharp, natural, or flat major or minor; or a single o for offkey. Frames TFLT and TMED have similarly complicated schemes to represent specific information chosen from various partly orthogonal sets. Other frames store dates in a specific format, or numbers that aren't allowed the slash. Finally TXXX stores a name before its main payload.
Semantically there are only three types of URL frame. They are all like a text frame with a predefined latin-1 encoding for their text (URLs are expected to be ASCII-clean while artist names are not). The first two differ by how many are allowed in a tag - either one, or one with a given URL. The third, WXXX, is analagous to TXXX, and stores a URL with a user-defined description.
There are a slew of other frames. They all have different layouts. For instance APIC has an encoding byte, a null-terminated name in Latin-1, a byte indicating the type of picture, a null-terminated description in the indicated encoding, and a binary blob. On the other end of the complexity spectrum, PCNT stores just an encoded number.
How does Mutagen handle this? Primarily by inheritance. Mutagen defines a Frame type with the underpinnings, and then for the text related frames there are five intermediate classes: TextFrame, NumericTextFrame, NumericPartTextFrame, TimeStampTextFrame, and PairedTextFrame. Finally for each frame ID, Mutagen defines a subclass of the appropriate TextFrame class. Most of these subclasses are one line classes, with a description stored in their docstring. Occasionally a frame—such as the genre frame TCON—requires unique processing. Mutagen handles URL frames with two intermediate classes—UrlFrame and UrlFrameU—along with WXXX for the final unique URL frame type. The other frames tend to share little or none of their payload structure, so they generally inherit directly from Frame or FrameOpt (a variant of frame I'll describe next entry).
With what I've described so far, you should have a decent picture of the arbitrary complexity of the ID3 frames. The obvious implementation of the Frame class hierarchy I've described saves a lot of the work of writing many individual handlers, or huge condition trees. There's still a lot of redundancy: many frame types store an encoding and a list of thus-encoded strings, many frames store numbers—some in binary and some as text—and many store arbitrary type indicators. Except within the intermediate classes above, these are uniform in neither positioning nor meaning. Mutagen needs to be able to read and write each frame, and also to present a usable programmer's interface. How can we satisfy these needs while not going crazy with cut and paste?
I've got one big trick up my sleeve to reduce the redundancy, which I'll talk about in my next entry.
Mutagen is a pure Python ID3v2 library I developed to improve Quod Libet's MP3/ID3 support. It's objectively one of the better things I've written. There are several parts of it I feel particularly good about and will talk about over this series. But first I want to cover three pieces of terminology.
When people started adding metadata to their music files, pretty soon
everyone called it "tagging MP3s," and then went on to refer to the
artist tag
or title tag.
This makes a lot of sense,
as the process of tagging a file with an artist and a title seems a lot
like attaching a series of price tags to the song. But that's not the
way I'll use the term tag here.
ID3 tags are a set of informal standards—there's no formal specification submitted to any standards body—and it has specific meanings to the words tag and frame. The frame is the atomic unit that corresponds to the price tag above. It's the name-value pair. The tag is a collection of frames. There can be more than one tag per file, but in practice there is rarely more than one (two if you count the ID3v1 tag). In contrast, while there can be any positive number of frames in a tag, there are rarely fewer than a standard complement of artist, title, album, tracknumber, and genre.
For this series, I will use the term tag when I mean a structured collection of frames, and frame when I mean a structured and encoded name-value pair.
ID3v1 came with a strict 128 byte block from which you could poll a small predefined set of information. The location in the block would tell you what name implicitly goes with it. Since this was insufficient—both the limited names and the limited space for values—ID3v2 was designed to use name-value pairs. Unlike VorbisComments or APEv2 tags which have free-form descriptive names and free-form binary or unicode string values, ID3v2 was designed to be extremely space-conscious and uses a short predefined character "names" (frame IDs) with variable- or fixed-length value fields.
This leads to frame IDs like TPE2 for text-frame performer 2 (the band, orchestra, or accompaniment), APIC for attached picture (cover or other images), and WCOP for URL to copyright information. There's over 130 defined frame IDs, of which over 50 are three-letter ID3v2.2 IDs, and over 80 are four-letter ID3v2.3 or ID3v2.4 IDs.
Next entry I will talk about the frame structure, and how that influenced Mutagen's design.