31st Aug 2008
Hallelujah! SANE ORM for threading in Python: Elixir+SQLalchemy - no memleaks!
In one of my posts I wrote about refactoring RarestNews’s bot on Django’s ORM problem and that was the fact that after the project was migrated and tested, I’ve added parallellism to it with my threading wrapper and it started leaking memory at a very fast rate - gigabytes in a matter of minutes.
That led to wild accusations on Reddit about the fact that I can’t program. I still don’t know the culprit in that case - be it Django’s ORM or Python, however that led me to look at alternatives. And there was another disappointment!
I’ve found Elixir - as simple as Django’s ORM, but even SIMPLER! Yet it did memleak again, but I’ve found almost an elegant solution
With Django if I only want to use ORM, not the URL mapper or templater (neither of which I don’t need in a bot), I still have to write a lot of boilerplate code (inclusion path to Django’s settings file, lots of imports from lots of files, etc…), with Elixir, it’s “from elixir import *”. (BTW Elixir is layer on top of SQLAlchemy’s ORM). And the declaration is pretty simple:
from elixir import *
metadata.bind = "mysql://root:@127.0.0.1/rarest"
class Movie(Entity):
title = Field(Unicode(30))
year = Field(Integer)
description = Field(UnicodeText)
movie1=Movie(title=u"Blade Runner", year=1982)
session.commit() # required, transactions are forced
No love lost here, very similar to Django. But…
This time I was smrrrrter, I did the parallel test before migrating a lot of code and WTF! Memleak. Again. If I add 10K objects to DB - there are 10K more variables (according to len(gc.get_objects()) )…
Ok, now that’s not funny. Does every ORM has threading memleak? Forking is not an option (it doesn’t leak, but 20MB forked processes can’t be compared to a few MB threads, especially if you run 200 of them).
Well, I won’t bore you with heapy and garbage collector witchhunt (for memleaks), the leaking part is sqlalchemy.orm.identity.IdentityManagedState object and there’s no documentation on how to “tiptoe around it” (friendly fun on SQLAlchemy’s source code), the solution is here:
movie1=Movie(title=u"Blade Runner", year=1982)
movie1.save()
movie1.expunge()
session.commit() # required, transactions are forced
FINALLY! Okay, it’s a bit of more labor - to clean every used object, but IT WORKS (others just don’t).
Just in case you were going to recommend an easier way, I’ve tried those ways and they failed:
clear_all()
sqlalchemy.orm.clear_mappers()
movie.expire()
session.flush()
session.close()
cleanup_entities(entities)
entities.clear()
P.S. There were no memleaks in my threading implementation.
In one of my posts I wrote about refactoring RarestNews’s bot on Django’s ORM problem and that was the fact that after the project was migrated and tested, I’ve added parallellism to it with my threading wrapper and it started leaking memory at a very fast rate - gigabytes in a matter of minutes.
That led to wild accusations on Reddit about the fact that I can’t program. I still don’t know the culprit in that case - be it Django’s ORM or Python, however that led me to look at alternatives. And there was another disappointment!
I’ve found Elixir - as simple as Django’s ORM, but even SIMPLER! Yet it did memleak again, but I’ve found almost an elegant solution
With Django if I only want to use ORM, not the URL mapper or templater (neither of which I don’t need in a bot), I still have to write a lot of boilerplate code (inclusion path to Django’s settings file, lots of imports from lots of files, etc…), with Elixir, it’s “from elixir import *”. (BTW Elixir is layer on top of SQLAlchemy’s ORM). And the declaration is pretty simple:
from elixir import * metadata.bind = "mysql://root:@127.0.0.1/rarest" class Movie(Entity): title = Field(Unicode(30)) year = Field(Integer) description = Field(UnicodeText) movie1=Movie(title=u"Blade Runner", year=1982) session.commit() # required, transactions are forced
No love lost here, very similar to Django. But…
This time I was smrrrrter, I did the parallel test before migrating a lot of code and WTF! Memleak. Again. If I add 10K objects to DB - there are 10K more variables (according to len(gc.get_objects()) )…
Ok, now that’s not funny. Does every ORM has threading memleak? Forking is not an option (it doesn’t leak, but 20MB forked processes can’t be compared to a few MB threads, especially if you run 200 of them).
Well, I won’t bore you with heapy and garbage collector witchhunt (for memleaks), the leaking part is sqlalchemy.orm.identity.IdentityManagedState object and there’s no documentation on how to “tiptoe around it” (friendly fun on SQLAlchemy’s source code), the solution is here:
movie1=Movie(title=u"Blade Runner", year=1982)
movie1.save()
movie1.expunge()
session.commit() # required, transactions are forced
FINALLY! Okay, it’s a bit of more labor - to clean every used object, but IT WORKS (others just don’t).
Just in case you were going to recommend an easier way, I’ve tried those ways and they failed:
clear_all()
sqlalchemy.orm.clear_mappers()
movie.expire()
session.flush()
session.close()
cleanup_entities(entities)
entities.clear()
P.S. There were no memleaks in my threading implementation.
Posted in python | Comments Off