After finding Python's mmap implementation fails on 64-bit systems due to a simple format string problem, I was distressed. This is code that works on the systems it was developed on. It's code that compiles without error on systems it fails on. It's due to a subtle format string inaccuracy that doesn't even receive the help of the compiler checking for invalid association. Yet the consequences are anything but subtle.
Unit tests are a great way to catch these sorts of bugs. If the mmap module had been developed with Test First Programming, chances are very good that this behavior would have been covered with a test and caught on the first supported 64-bit platform with mmap. I hope when Python fixes the mmap bug that my sample reproduction code is turned into a unit test. Then if things blow up on some future 128-bit system we'll find out sooner than a bug report.
But that's not enough. I want my programs to be bullet-proof. I want something I write in Python and test on 32-bit systems to work on 64-bit systems. Always. Or at least to blow up early at import. To proactively ensure this I crafted and now present one ugly program to check format strings in PyArg_ParseTuple, PyArg_ParseTupleAndKeywords, or (poorly) Py_BuildValue. The results (below) are surprisingly comforting.
There are very few errors that are obviously in this 32-bit/64-bit
clash. There are a lot of unclear cases. Is a size_t the size of an
int or a long?
Is it always that size, or should it be
transferred through a variable of known size such as an unsigned
long?
There are also plenty of false mismatches from my format
checker, but since the numbers aren't insanely large it's good to err in
that direction.
What is the best way to approach Python parsing format strings in the future? I offer two alternatives, and seek others. Do you think the current code needs to change?
When both unknown and unsigned-mismatches are filtered, these are the results for trunk/Python and trunk/Modules as of r39818. Note bltinmodule is the only pure long vs int conflict outside of mmapmodule (although the size_t vs int/long goes one way in mmapmodule and another in bsddbmodule). The conflict in bltinmodule isn't currently a problem, as the parsed value is not used—it appears to exist solely for cleaner error reporting. Anything too weird is a limitation of my checker.
mactoolboxglue.c:143 `err' type `OSErr' >< i: int pythonrun.c:864 `filename' type `PyObject' >< z: char pythonrun.c:864 `text' type `int' >< z: char bltinmodule.c:1910 `reverse' type `long' >< i: int socketmodule.c:1590 `buflen' type `socklen_t' >< i: int socketmodule.c:3071 `fd' type `SOCKET_T' >< i: int mmapmodule.c:465 `size' type `size_t' >< l: long mmapmodule.c:538 `dest' type `unsigned long' >< i: int mmapmodule.c:538 `src' type `unsigned long' >< i: int mmapmodule.c:538 `count' type `unsigned long' >< i: int mmapmodule.c:867 `access' type `access_mode' >< i: int mmapmodule.c:955 `access' type `access_mode' >< i: int xxmodule.c:207 `b' type `long' >< #: int _bsddb.c:2583 `flags' type `u_int32_t' >< i: int _bsddb.c:3751 `flags' type `u_int32_t' >< i: int _bsddb.c:3778 `flags' type `u_int32_t' >< i: int _bsddb.c:3802 `flags' type `u_int32_t' >< i: int _bsddb.c:3825 `timeout' type `u_int32_t' >< i: int _bsddb.c:3825 `flags' type `u_int32_t' >< i: int _bsddb.c:4256 `flags' type `u_int32_t' >< i: int _bsddb.c:4371 `flags' type `u_int32_t' >< i: int operator.c:23 `a1' type `PyObject *a1; int a2;' >< O: PyObject operator.c:23 `a2' type `PyObject *a1; int a2;' >< i: int zlibmodule.c:126 `input' type `Byte' >< s: char zlibmodule.c:203 `input' type `Byte' >< s: char zlibmodule.c:399 `input' type `Byte' >< s: char zlibmodule.c:469 `input' type `Byte' >< s: char zlibmodule.c:781 `buf' type `Byte' >< s: char zlibmodule.c:781 `adler32val' type `uLong' >< l: long zlibmodule.c:799 `buf' type `Byte' >< s: char zlibmodule.c:799 `crc32val' type `uLong' >< l: long bsddbmodule.c:473 `recno' type `recno_t' >< i: int bsddbmodule.c:512 `recno' type `recno_t' >< i: int bsddbmodule.c:794 `reclen' type `size_t' >< i: int clmodule.c:886 `x' type `int x;' >< i: int _cursesmodule.c:379 `attr' type `attr_t' >< l: long _cursesmodule.c:388 `attr' type `attr_t' >< l: long _cursesmodule.c:426 `attr' type `attr_t' >< l: long _cursesmodule.c:436 `attr' type `attr_t' >< l: long _cursesmodule.c:472 `attr' type `attr_t' >< l: long _cursesmodule.c:482 `attr' type `attr_t' >< l: long _cursesmodule.c:517 `attr' type `attr_t' >< l: long _cursesmodule.c:546 `attr' type `attr_t' >< l: long _cursesmodule.c:603 `ch1' type `chtype' >< l: long _cursesmodule.c:603 `ch2' type `chtype' >< l: long _cursesmodule.c:692 `attr' type `attr_t' >< l: long _cursesmodule.c:863 `attr' type `attr_t' >< l: long _cursesmodule.c:872 `attr' type `attr_t' >< l: long _cursesmodule.c:907 `attr' type `attr_t' >< l: long _cursesmodule.c:916 `attr' type `attr_t' >< l: long _cursesmodule.c:1010 `attr' type `attr_t' >< l: long _cursesmodule.c:1020 `attr' type `attr_t' >< l: long _cursesmodule.c:1056 `attr' type `attr_t' >< l: long _cursesmodule.c:1066 `attr' type `attr_t' >< l: long _cursesmodule.c:1390 `attr' type `attr_t' >< l: long _cursesmodule.c:1399 `attr' type `attr_t' >< l: long _cursesmodule.c:1677 `(int *) &event.bstate' type `int' >< l: long fcntlmodule.c:174 `arg' type `char' >< i: int skipping ../python/trunk/Modules/_hashopenssl.c:405 pop from empty list
See also unfiltered ParseTuple and unfiltered BuildValue logs.