Following in the tradition of while doing nothing useful I should ensure others are distracted too (aka. “maintaining a weblog”), here’s a slightly long-winded update on py-lmdb…
The test suite has progressed glacially in the recent months, today it is woefully messy and incomplete. I discovered huge behavioural differences between the CFFI and CPython versions while writing tests for lmdb.Environment, so there are almost certainly more lurking in Transaction and Cursor.
Looking to practice your unit testing skills, and contribute to a useful project? Please write some tests for the library! I always seem to find ways to avoid it…
I finally bit the bullet and translated the CPython extension from C99 to C89, so that it would build under Microsoft Visual Studio. This was more invasive than expected, and required many additional hacks, since MSVC lacks standard headers like stdint.h and suchlike.
The Windows build is now in reasonable shape, the meagre test suite passes, and basic operations seem to work as expected. Version 0.81 has binary distributions on Cheese Shop for all recent Python versions on Windows, currently as both .whl and .egg since I couldn’t decide if solely distributing .whl impacted compatibility with older pip/easy_install.
All that’s left is padding out the tests to include Windows-specific scenarios, and automating the build. That’s already started in windows_setup.py and windows_build.py.
Read-only Transaction Freelist
I’ve been planning to add a rather tricky feature to the binding almost for as long as it has existed – the ability to transparently cache LMDB read transactions within the binding, in order to avoid a costly step currently necessary to ensure safe operation. Actually this feature has been implemented and reverted twice before already, so it’s worth examining why it is valuable in the first place…
Normally LMDB makes use of Thread-Local Storage (TLS) to cache a reader slot for use by the calling thread. This slot is simply a reference into a shared memory array that is used by writers to determine the oldest version of the database that is still in use. When a thread attempts its first read-only transaction, this cached slot won’t exist, and so a more expensive process is required find a free slot for use by the thread, which involves taking an environment-global lock.
Once the slot is acquired, it will be reused by all read transactions occurring on the same thread, and the global lock need never be taken again for the lifetime of the thread. LMDB arranges for a TLS destructor callback to ensure the slot is deallocated when the thread exits. At read-only transaction start, a reference to the cached slot is copied to the transaction.
Due to this and some other aspects of LMDB’s design, normally there is an association between a read-only transaction and the thread it was created on, and therefore also a requirement to ensure the transaction is completed before its initiating thread exits.
This approach is great for C programs with long-lived threads, and probably also for short-lived threads, where the cost of acquiring the reader may be dwarfed by the cost of creating the thread, and since it is somewhat simpler from a C program to ensure the lifetime of the transaction does not exceed that of the thread, however for Python programs the situation is less than ideal.
To ensure safety, py-lmdb would need to track and cross-check every attempted transaction access against the current thread ID, which introduces considerable overhead to every operation, especially in CFFI land, in addition to requiring OS-specific treachery in the C extension.
This is since a user could create a Transaction on one thread, then pass it to another thread whose TLS does not reference the same reader slot. On exit of the first thread, the slot would be marked deallocated. Now there is a situation where a Transaction exists on the wrong thread referencing an old version of the database, but has no lease on that version. Hilarity ensues.
The binding would also need to track whether the calling thread already has a read-only transaction, and if so throw an exception. If it didn’t do this, it might be possible for 2 transactions to share the reader slot, possibly causing one unpleasantly surprised transaction’s view of the database to be upgraded, or similar entertaining behaviour.
Even if the user is relied on to follow these decidedly non-noob-friendly rules, Python itself is a sufficiently unfriendly environment to ensure even a knowledgeable user may encounter problems. For example look at what happens when a crash occurs within a transaction, and a reference to the Transaction is grabbed by some debugger. Here the debugger might grab the frame (and by proxy the Transaction), potentially while the thread on which the crash occurred exits.
Another issue is that in the presence of deferred GC (such as in PyPy), it is almost guaranteed that should a crash occur within a read-only transaction, the Transaction’s destructor will be invoked from another thread (the next thread to cause enough memory pressure to invoke the GC), which introduces a race between its origin thread being destructed (freeing the slot), another thread acquiring the slot, and the transaction’s abort() marking the slot as unused. So by design, using TLS under PyPy is a guaranteed recipe for disaster unless the user practised unreasonable care.
Finally as it is commonplace in Python to use asynchronous IO for networking, allowing only one read-only transaction per thread would significantly decrease the utility of the binding, since e.g. a non-blocking server could not support streamy reads from a consistent view of the database without starting a thread for each client. If the database did not fit in RAM, and its reads may incur faults, it would also be impossible to pass a client’s transaction from the network thread to some IO thread to prevent blocking the entire program.
Following the issues encountered while developing py-lmdb and similar problems faced by other micro-threads environments (e.g. the Erlang binding), around April last year LMDB grew the ability to disable the use of TLS. In this mode, transactions are no longer tied to their originating thread, one thread may have many transactions, and each transaction exclusively allocates and deallocates its own private reader slot.
Great! Or so it seems. Unfortunately now there is no place to cache that expensive reader slot - and so each new read-only transaction must take an environment-global read lock!
For py-lmdb on OS X where there is no lightweight cross-process mutex, each read-only transaction now requires 2 system calls, resulting in ~4x slowdown compared to the cached-slot case. For a busy environment the outlook is even worse - simply locating a free slot when 2047 slots are already in use is ~21x slower, and this is before accounting for contention on the global lock.
Without extra work, disabling TLS negates some of the most appealing aspects of LMDB’s read-optimized design.
Light at the end of the tunnel
Joyfully, LMDB provides an additional API to temporarily reset (giving up the lease on the database version) or renew (acquiring a lease on the latest database version) a read-only transaction, without causing it to give up its associated reader slot. And so that is where Transaction freelists enter.
While it is possible to expose simple Transaction.reset() and Transaction.renew() methods to Python, this would require each library user to reimplement some kind of error-prone cache in slow Python code, and also prevent the binding itself from using spare read-only transactions managed by the user, as would be needed, say, to implement any kind of “transactionless get()” API.
Since aside from the licensing and portability benefits, one of the principle attractions to LMDB is its performance, implementing this cache directly in the binding is appealing. Unfortunately, producing a correct implementation has so far eluded me..
My first, second, and third attempt at implementing this freelist has ended in a tragic mess I can no longer analyse. The current master version represents my third attempt, which is silently corrupting memory in certain circumstances, and looking once more like it’s going to be reverted.
So far my attempts have tried to directly cache the entire Transaction object, which amongst other things, keeps a reference to the heap-allocated LMDB transaction structure, a strong reference to the Environment, a strong reference to the transaction’s default database, and participates in a DOM-like tree of linked lists used to track inter-object dependencies (yet another safety feature).
Transactions already had many ways they could become invalidated: through an error, Environment.close(), Transaction.abort()/commit(), or their last reference being discarded. Adding yet another linked list, and managing correct finalization of “immediately deletable”, “cacheable for reuse”, and “Environment.close() called, destroy previously cached transactions” has utterly exceeded my ability to reason about the code being written.
So it looks like this feature may take yet another attempt, assuming I can’t improve my confidence in the current code. It seems the next approach would be to split the LMDB transaction structure off into a separate binding-internal object that has fewer possible states.
Whatever the outcome, I’m not cutting another release until the problem is solved, since as it stands, the design of py-lmdb is reflecting negatively on the performance characteristics of LMDB itself.
Nothing’s ever simple
While writing this, I’m again reminded how even the simplest of projects (“it’s just a C wrapper!”) can trivially and unpredictably snowball into significant effort just to reach a robust, generally useful state.
I originally started hacking on the binding in the course of working on Acid, and so far py-lmdb has had almost as many commits made to it as its ‘grandfather’ project. Still, the binding is “almost complete” (famous last words), and so I’d like to push it through the final hurdles so at least something deliverable has come from my last year or so of hobbyist coding.
In the course of thinking about freelists, I have repeatedly been reminded just how incomprehensive many tests suites are. It is quite a simple matter to write thousands of lines of tests for a database binding without ever considering these obvious interactions with TLS, nor for that matter succeed in devising a reliable test to trigger them.
I’ve recently been considering a testing approach similar to fuzzing in order to gain some confidence in the binding. I’m not sure if I’ll ever write such a test (at some point research effort may exceed unit test monotony!), or even manage to devise a useful strategy for it, but the idea remains appealing. There is precedent for this kind of approach, with some powerful results, for example Dave Jones’ Trinity Linux system call fuzzer.
Until next time..