Armed with a text editor

mu's views on program and recipe! design

July 2007

Editing Appreciated Posted 2007.07.09 18:23 PDT

Nis Martensen, who has undertaken to translate my CairoTutorial into C, found several typos and a source code misalignment. (The latter possibly introduced when I fixed the real error Mark spotted.) I fixed the typos and updated my code display macro to work with tagged lines of code so it should be much less brittle against future code updates. Thanks, Nis!

(0 Comments ) (0 Trackbacks) cairo python

April 2007

ReReplayGain Posted 2007.04.07 20:16 PDT

After ending up with too many MP3 files without proper ReplayGain information, and knowing that mp3gain doesn't suit my needs quite right, I went jonesing around in GStreamer once again. Today was my lucky day, though; René Stadler had created a GStreamer element which calculates ReplayGain.

In what's starting to feel usual, Debian doesn't have up to date versions of the software I want to use in either testing or unstable, but this hasn't prevented the maintainer from packaging what I want to use and mirroring it in experimental. After downloading and installing it, I was good to go. First I put together a command-line example of the process, to make sure I knew what was required. Then I munged it into a plugin for Quod Libet.

There's still something fishy about it, as MP3s which I have previously found peak values for above 1.0 are now being reported capped to 1.0; Ogg Vorbis files, on the other hand, correctly exceed this cap and run presumably all the way to 2.0. I hope to clarify my understanding of this probable GStreamer bug shortly, as other than that all values match what I've seen before. I'm quite pleased the results of tonight's work.

(4 Comments ) (0 Trackbacks) gstreamer python quodlibet

March 2007

Script your build, no really! Posted 2007.03.20 20:05 PDT

Thanks Mark Edgington for finding and alerting me to an error in the code behind my Cairo Tutorial. Somewhere between when I originally wrote it and used it to generate it, and when I uploaded it, I introduced an ordering problem that caused the Mask diagram to fail to draw. The trivial difference just required moving the call to super(Mask, self).__init__(*args) to the end of Mask.__init__. For anyone reading the tutorial: don't fear; no informative parts have changed.

This simple error has been sitting there since I originally wrote the code some nine months ago. Even a small project like this tutorial could have benefited from continuous integration.

(0 Comments ) (0 Trackbacks) cairo python

October 2006

Plugging in to Epiphany Posted 2006.10.21 13:13 PDT

I recently decided to try out Epiphany for real after the umpteenth time Firefox went haywire when my gtk theme changed. As a base browser, the only things it has going for it are the ability to drag tabs around, and the tag-based bookmarks system.

On the downside, using that tag-based bookmarks system means starting over when you have hundreds of poorly categorized bookmarks like I did. I'm still not sure if I like it, but it's definitely neat.

On the plus side, Epiphany can be tweaked with Python extensions. In particular Stefan Stuhr's Only One Close Button is a blessing to those of us who don't like scrolling through tabs.

With this in mind, I set out to fix a problem I've been experiencing. One of the sites I visit takes user submitted content, including a link and a comment. Lots of submissions include a link in the comment field. However in order to preserve their table layout without having it overflow, the site embeds U+200B Zero-Width-Space characters into the shown text. When I select the text and paste it to a new tab, the url will contain a '%E2%80%8B' substring (URL-escaped UTF-8 encoded U+200B) which invalidates the destination, and tends to yield either a 404 or a redirection.

I wrote up a simple extension based on the tutorial, but debugging it was a real pain. My initial session of epiphany was running somewhere with no stdout or stderr. My first draft of my extension had several errors which resulted in thrown exceptions. A bug with Epiphany causes it to fail to exit properly when Python extensions throw exceptions. It was not only failing to reload my extension, but when I thought I was running Epiphany in a terminal, it was really resuming the existing session. Let this be a lesson to Epiphany Python extension developers: be willing to kill your running Epiphany process!

After figuring that out with the help of jfr on #epiphany, I was then able to complete the extension handily. It's not ideal: it causes a second automatic page load to happen, and going back to the page with the zero width space in the URL will again trigger the redirect. But it's good enough for me. If anyone else wants it, either to use or to fix, you can get it from EpiphanyExtensions.

(0 Comments ) (0 Trackbacks) epiphany python

Lament of the Python Developer Posted 2006.10.12 12:22 PDT

I love working in Python. When I was first coming from Perl everything felt weird. Now I can't imagine going back voluntarily. However there are a few laments that come up time and time again to trouble me, the Python using developer.

Cross-version compatibility

The developers of Python have a pretty rigorous backwards-compatible culture, but the actual implementation leaves something to be desired. While many modules, especially those outside of the standard library, but maintained by those who develop Python, are able to be compatible across many versions, the recent new Python 2.5 managed to create two completely separate incompatibilities in Quod Libet and mutagen. As Martin v. Löwis said, someone always notices. If only they had put as much thought into the changes that hit us as they seem to be putting into whether floating point +0.0 and -0.0 can be distinguished.

Quality of bindings

Every binding of a library implemented is implemented separately, and generally by different authors. This leads to python modules that feel wildly different, from different naming conventions, to support for pythonic methods. Support for pythonic methods range themselves from supports iteration to the right mixture of functions and objects with methods. Sometimes collections of similar modules group behind a common API such as the DB API.

When I first started using python, I happened to pick Pygame, a very easy to use binding. It was well documented. The parts you needed regularly were small enough to fit in your head. There were helpers in the right places. It accepted reasonable duck-typing in convenient places, such as accepting any 4-element list or tuple where a pygame.Rectangle was going to be used, implicitly upcasting it.

By contrast, recently I've tried to find a good binding for working with subversion, and have had poor luck. The initial binding, a poorly documented SWIG wrapper, was very tied to a low level C interface. Nobody using the python bindings would want to deal with the APR directly. More recently, pysvn seemed to address this. The functions seemed to expose the right level to let me get my work done.

My Ideal PySvn

After working with it for a couple days, however, I've changed my mind: pysvn still has a long way to go. I'm going to present a couple steps that would bring pysvn closer to my idealized version, shamelessly ignoring that implementing it would require many API-incompatible changes.

While trying write a simple subversion browser, there are two road blocks I keep bumping into.

  1. Revisions are way too hard to use.
  2. Dictionaries of attributes are filled with inconsistent names.

I'm not sure which of the above road blocks bothers me more. On the one hand, the Revision object reads like a straight translation of a C struct with a union inside it. This allows it to represent any of several kinds of revision. But I would contend that it should just accept simple python types.

For simplicity, the constructor should accept these, and any function that requires them should run them through the constructor as necessary. For an attempt at bonus points the constructor could also accept strings like '137' or '1986-01-28 17:39:13.620000' in place of integers or time objects, but it's probably better not to accept those. Better yet, these objects should always be used, and the Revision object should be dropped from the Python interface.

On the other hand, inconsistent names in the properties dictionaries are really hard to keep straight. The info() call returns an Entry object, with attributes including commit_author commit_revision commit_time kind name revision url. The info2() call returns items with attributes including last_changed_author last_changed_rev last_changed_date kind rev URL. Note that the commit prefix became last_changed, the time suffix became date, revision became rev, url became URL, and name disappeared. The log() and ls() calls suffer similar transformations, although the amount of information each contains is smaller.

I'm willing to assume that the actual subversion API is similarly fickle, and that the pysvn authors didn't create this mess, but I really want it to be cleaned up. I'd like to get an object, say a subclass of a mythical PySvnItem, which knows its url and name and revision, and can look up various other items that may not have been returned by the corresponding C API call when you access the appropriate attribute so I don't have to worry about which call provided the original PySvnItem. And the names of these attributes should be normalized so I don't have to worry about last vs last_changed vs commit prefixes.

Bugs

No lament is complete without mention of at least the one latest bug that delayed some work. In this case it was pysvn's diff() method, which fails with a weird C++ error if you pass it revisions as positional arguments. Peter found that named arguments work fine, so it's no longer in my way.

(0 Comments ) (0 Trackbacks) pysvn python

August 2006

Bikesheds: Naming pygame-ctypes; Pronouncements Posted 2006.08.26 09:03 PDT

Sometimes there are just a few too many bikesheds out there just begging to be painted. While I've managed to not quite join the discussions themselves, I just had to share my viewpoint somewhere...

pygame vs ctypes

There have been a couple threads on what to name what may or may not become the new pygame. Most recently a cutesy name pistol was suggested. All because of some hopes to avoid confusion over the original pygame name.

I think the worry about confusion is a red herring, so long as things are properly namespaced. The ctypes implementation is split into two levels: a SDL wrapper, and a pygame compatibility layer. The first comes under the SDL namespace, with entries like SDL.image, SDL.mixer, etc. The pygame compatibility layer should just be another one of these: SDL.pygame.

Then existing code can switch from import pygame to from SDL import pygame if it wants to leverage this layer. No crazy names that nobody understands. No confusion over which is being used. We're done.

TurboGears vs Django

There's a lot of kerfuffle on various python blogs about what Guido van Rossum did or did not pronounce about either TurboGears or Django being the official BDFL choice, and so forth. I have yet to see a single link to text that came from Guido himself. Please stop mountainizing this rumor until there is an email or blog post or video from Guido that the rest of us can read or watch.

As for which I prefer? I'm completely a roll-my-own type, as I only do websites like this one.

(3 Comments ) (0 Trackbacks) bikeshed python

July 2006

Cankiri 0.1 Posted 2006.07.29 23:21 PDT

As if I didn't have enough other projects on my hands, I've just put enough finishing touches on Cankiri to release it into the wild licensed under the GPLv2. It came about after looking at Istanbul around version 1.2 and being disgusted with the limited features and overengineering in the code. Come on, you don't need two directories and at least five files for this functionality. Really. Since then both Istanbul and I have added screen area selectors and audio recording. I've got all my code in one 400 line file; Istanbul now spans many more files. Where this really shows: ls -l.

It's amazing how concise you can be with python once you know what you're doing. I hope you find Cankiri easy to use, as a single-file distribution leaves no room for documentation. Let me know what you think.

(All the jabs at Istanbul's code aside, I'm very grateful for the GStreamer plumbing I've been able to take and reapply. Since I've yet to really learn GStreamer, this has been critical for me.)

(15 Comments ) (2 Trackbacks) cankiri gstreamer python

June 2006

Painful Debugging Posted 2006.06.03 15:32 PDT

The last couple days I went through one of the more painful debugging experiences of my life. The symptom was that one of our OggFlac unit tests in Mutagen was failing. But not on x86; that would be too easy. Instead it was only failing on AMD64 systems, which is reminiscent of a bug I covered before. Thankfully Pete was able to provide direct network access to an AMD64 machine for me to test on; otherwise we'd undoubtedly still be fighting this bug. As it was, we spent at least four hours each on Thursday and Friday.

So knowing that this was a painful bug, how do we debug such symptoms?

Tighten our tests

First things first, try to isolate the problem as much as possible. The failing test did a small sequence of things, starting from a nearly blank file, adding some stuff, deleting it, adding other stuff, and then failing the test on line 39. So pare it down. As it turns out all we need are the three lines 37–39: add, save, test.

Examine the code in question

Now that we know which methods reveal the bug, we can look into the code behind them. This can help find many kinds of errors. I think it helped us tighten our code a bit. But after many hours of staring at it and trying various things it was still no help.

Compare to passing code

We had a passing version of the same test on x86. Was the problem in our code, or in the crosscheck we were performing? Taking the output produced on x86 and testing it with flac on x64 (and vice versa) showed it was a problem in our code, and not the tool. Drat. But hey, the file is the same up until that last step. That confirms everything else so far is good.

Since looking at the file gave us no good clues, other than the broken file in question looked like garbage, it was time to delve deeper. I coded up a tracing wrapper, which injects tracing primitives into python code. I invoked this from the OggFLAC tests, and traced ogg, oggvorbis, and oggflac modules under mutagen.

Note: the linked version is the final version. Earlier versions lacked tracefile, function return values, and indentation.

Confuse yourself over and over

The only differences for the longest time, after filtering out inconsequential differences, were at the failure spots themselves. The passing code would continue; the failing code would report an exception. I would add more tracing. The same. What else can I trace? There has to be a difference somewhere. So I added tracefile.

The dreaded heisenbug

And the size of the tracelog exploded. I spent time coping with that when I should have been realizing that there is no longer any difference. The good and bad trace were too different (because of our fallback code for the AMD64 mmap problems in python) so make them more similar. Somehow the failing code started working. What did I do? I added tracefile. Huh? Great...now I can't even observe my bug without it morphing.

In concurrent programming, or code with extreme speed needs, heisenbugs can be a real problem. Fortunately Mutagen is neither.

The final push

The thing about the final push is you never know when it will hit. When you first start debugging you're sure it's right around the corner. Then the more you work at it without solving it, the more convinced you become that something insane is wrong with your tools. So here's a reconstructed stream of consciousness from when I first saw the heisenbug.

Take all tracing out, but leave in tracefile. Okay, it's still working. Take out tracefile. Okay, it fails. Hey, it fails on x86 too. Hallelujah! Wait. Huh? Pare down tracefile. Use it and take out #read. It fails. Put #read back and take out #write. It works. Leave in #read but simplify it to just recurse the call. It fails. What's left?

Eureka

I fiddle with it and find the only piece I need is self.tell() before calling self.read. So I mention my bewilderment to Joe and he finds a gem: this bug has been reported to Python before. And denied. Apparently rightfully so. Six month old tested code is to blame. I fix it up by using explicit seek calls, and everything passes.

Lessons learned

There's always another dark corner of your toolchain left to be learned, often the hard way. In our case it was the fact that interleaved read and write operations on a file pointer, without explicitly resetting the stream position, are undefined. They work most of the time on the implementations we run on, but they fail sometimes. I feel we did an okay job of challenging and proving our assumptions right or wrong, but it was really hard to make the leap that let us trust the new barely tested code that was revealing the failure, and stop trusting the old tested code that was causing it. I'd like to do better about that in the future.

In the meantime, Mutagen 1.4, with corrected file handling code on 64-bit systems, here we come!

(0 Comments ) (0 Trackbacks) debug mutagen python

May 2006

Puzzling Answers Posted 2006.05.18 07:02 PDT

So what does ''.join(chr(sum(((val >> i) & 1) << (7-i) for i in range(8))) for val in range(256)) do? If you run it on python2.4 you get an ugly string. Specifically it's a list of 256 bytes with their bits swapped in order of significance: 01 becomes 80 and so forth.

But why is this useful? Thanks to Joe's work it's now part of Mutagen, and is being used as part of a scheme to calculate the kind of CRC32 that the Ogg container (of Ogg Vorbis) requires. Why a scheme? Speed. As an interpreted language, python isn't well suited to small bit calculations on large sets of numbers. A standard reduce(lambda x, y: table[(x>>24) ^ y] ^ (x << 8), data, 0) scheme is too slow for comfort. However the table used in the existing C implementations zlib.crc32 and binascii.crc32 is from the bitwise reversed generator polynomial of the one used for Ogg.

Peter Johnson came to the rescue and figured out we could get the same effect by bitswapping each byte of the data and then bitswapping the final 32-bit result. Thanks to str.translate, we hoist most of the work into C, and the above puzzler code runs once at module import to generate the translation table.

And yes, Joe, it's good for confusing you at 1AM. :)

(0 Comments ) (0 Trackbacks) puzzle python

Puzzlers Posted 2006.05.17 06:51 PDT

What is the following useful for?

''.join(chr(sum(((val >> i) & 1) << (7-i) for i in range(8))) for val in range(256))
(1 Comments ) (0 Trackbacks) puzzle python

Previous entries