Using pickle for game saving, 7 months on

This post isn’t much use for anyone but programmers: it’s about the save format I’ve chosen, and a big overview of how it’s been working out for me.

As I mentioned in an earlier post, rather than packing my data into a special format for saving, I just feed my game objects directly into python pickle. Most Python folks regard this with horror!

The drawbacks haven’t exactly been the ones that everyone warns about. Most people I’ve talked to consider that doing this makes it infeasible to change the data format while retaining backward compatibility. In fact, it was pretty quick to set up a data migration system that overrides bits of pickle deserialisation and runs special migration scripts based on a version number stored inside the object.

One of the initial problems with object graph serialisation is that it can traverse its way into objects you might not expect. I wrote a mixin for objects with constant members (things like definitions of transformations and such) that causes them to override serialisation and save themselves out as a string ID rather than a full object. Combined with the fact that a web app generally needs a good separation between the app and the session data, tagging all the constants with this mixin did a really good job of hemming in serialisation, and it hasn’t been a source of significant problems.

The Primary Drawback

The tricky bit is knowing when you have to write migration scripts for the stuff that is mutable. The whole benefit that drew me to a schemaless system was the ability to experiment with gameplay code without needing to modify load/save code every step of the way. When every change in the data format is implicit, knowing when to write migration scripts is very error prone. I’ve been taking precautions, but it hasn’t been enough to prevent disasters:

  • Upgrading is strictly a one version at a time process, and shouldn’t behave differently whether it’s a one version bump or working its way through a series. Obviously needing to test combinations of source and target versions would be a nightmare.
  • I not only keep the data format version inside the objects, at the root of the object graph I also keep the SCM version number. On my dev machine, whenever I load a saved game and the SCM version number doesn’t match, I write the saved game I’ve just loaded to a special DB table so I have lots of specimens of old saved games, associated with known code revisions.
  • I have a debug UI on the load screen that shows me older versions of all of my saved games. Loading from there clones and upgrades them rather than upgrading in place. This makes testing fairly easy and repeatable – IF I know what I’m testing for.
  • One technique that works beautifully for the constant registry is that the game dumps a list of all the constant IDs into a file when it loads up. I’ve been putting this file in source control so that I notice when I’ve inadvertently removed or renamed an old ID that old saved games will still try and refer to. Scanning the diffs when it changes has been quick, easy and has caught a lot of problems early. Naturally checking in a generated file like this would result in a mess of conflicts in a workplace with multiple programmers, but I’m working solo so I haven’t worried about it.
  • I’d love to have the same kind of visibility about schema changes for the serialised objects too (though without having to update that schema myself!). I’m considering constructing an example game at start-up, walking the object graph and saving out all the attribute names so that I can do something analogous. Still not quite sure how to get the best coverage with this.

Performance

One drawback that hadn’t occurred to me is that overriding the deserialiser in so many places really hurts performance because it doesn’t get to use the C version anymore. This is a pretty big deal for a stateless web app, because I’m loading and saving the saved game for every request, so the majority of the server’s time is spent in deserialisation. For Fleshcult, I’m seeing about 100ms of overhead from this alone (bear in mind a Fleshcult saved game at the endgame is around 40KB uncompressed and a Heroku dyno isn’t a CPU core, it’s an underprovisioned, virtualised sliver of a hardware core that usually runs pretty slowly and on occasion runs appallingly). Pickle’s not very fast to begin with, it’s around 10 times slower than JSON. Oh well, at least it’s a scalable kind of slow – better than bottlenecking on database queries.

Other drawbacks

Another drawback that I really should mention is that it’s not safe to load a pickle from an untrusted source. They call constructors by name from strings embedded in the pickle, so it’s trivial to take over the machine with one. I had an idea I might be able to save people’s games into a cookie, but that’s a gaping security hole with pickle, so I keep them on the server side.

Naturally a database full of blobs is no good for running queries on. The two cases I’d need to do this would be analytics and multiplayer (not that it’s in my current plans). In both cases I’d rather just keep duplicates of the information I need in an RDBMS schema that’s well suited for it, when I find that it’s necessary.

All in all, pickling is working well enough for me to keep using it, but I still wonder whether it would’ve been less effort and more reliable to just take the hit and do things normally, writing load/save code for an explicit schema.