Close

Python Metaclasses

Today I’m going to talk about python metaclasses. Metaclasses are generally considered to be an advanced python topic, but I promise that it’s really not that bad.

The first step on the path to understanding metaclasses is to recognize that in python, even though you don’t need to specify types (as you do in many other programing languages), all python objects have a type. Essentially, the type of an object is the class that instantiated it.

Furthermore, we need to understand that everything in python is an object. Yes, everything. That means that if you can assign something to a variable in python the question “what is the type of this variable” is meaningful and has a concrete answer. Lets take a look at a few types we’ve all encountered. We do this by using the built-in function type()

my_string = 'spam'
print type(my_string)
<type 'str'>

Cool. As expected, my_string is of type 'str'.

In fact, we could have defined my_string by using the class constructor like this:

my_string = str('spam')
print type(my_string)
<type 'str'>

Ok how about some numbers:

print type(42)
print type(3.14159)
print type(1 + 6j)
<type 'int'>
<type 'float'>
<type 'complex'>

Hopefully nothing unexpected there.

Now let’s make our own type and inspect it:

class foo(object):
    answer = 42

f = foo()
print type(f)
<class '__main__.foo'>

Ok, that’s a bit more interesting. f is of type '__main__.foo'. That means that the type is foo which is in the namespace __main__. The __main__ part is just because we are in an interactive session. That would be replaced by the module (or package) if you were writing this in a file.

This is where things start to get wild.

What is the type of foo? Not an instance of foo, just foo. Let’s find out:

print type(foo)
<type 'type'>

Ok. That’s really weird. Apparently foo is an instance of the type type … whatever that means.

It turns out that this is unnecessarily confusing. type in python does double duty. If you call type with a single argument it is a function which returns the class that instantiated the argument. That’s what we’ve been doing so far.

The second (and more rare) use of type is as a constructor for the class type. Let that sink in for a moment.

That’s right. type is a class and foo is an instance of that class.

Remember, everything in python is an object, which means that everything is an instance of some class—even classes!

Ok, since foo is a class, and it’s an instance of type, that means that type is a special sort of class that instantiates other classes. As you may have already guessed, type is a ‘metaclass’.

A metaclass is any class that produces other classes.

Let’s get back to that second use of type as a constructor. Called as a constructor, type takes three arguments:
name’, ‘bases’, and ‘class_dict’. ‘name’ is the class name, ‘bases’ are the ordered list (actually a tuple) of super-classes (remember that python supports multiple-inheritance), and ‘class_dict’ is the dictionary that represents the class members (e.g. methods and class variables).

Let’s try and recreate foo using this weird syntax (we’ll call it foo2):

foo2 = type('foo2', (object,), {'answer': 42})
f2 = foo2()

print f
print f2
print 'f.answer:', f.answer
print 'f2.answer:', f2.answer
<__main__.foo object at 0x128be0b10>
<__main__.foo2 object at 0x128aa3b10>
f.answer: 42
f2.answer: 42

Fantastic! foo and foo2 are indistinguishable despite the dramatic syntactic difference in their class definition.

It turns out that using the class keyword is just syntactic sugar for the type syntax.

This leads to the key idea. What if we wanted to extend the behaviour of type for some of our classes? This turns out to be quite easy to do.

class my_meta(type):
    def __init__(cls, name, bases, cls_dict):
        cls.came_from_my_meta = 'Oh yeah!'

class foo3(object):
    __metaclass__ = my_meta

f3 = foo3()
print f3.came_from_my_meta
Oh yeah!

Note that the came_from_my_meta attribute was not specified in foo3; it came from the metaclass.

Now for the first metaclass superpower:

Metaclasses are forever. Any subclass of foo3 will have the same metaclass. As David Beazley said, metaclasses are like a genetic mutation. Subclasses automatically (and often without the author knowing about it) inherit the metaclass. This can be super powerful.

class foo3_sub(foo3):
    pass

f3sub = foo3_sub()
f3sub.came_from_my_meta
'Oh yeah!'

As you might expect, you can define more than just the __init__ function. You can also define other special methods (double-underscore methods) like __new__ etc. as well as ordinary methods and properties. Just keep in mind that these methods will only be on the resulting class and not on instances of the resulting class. Lets extend that last metaclass to illustrate:

class my_meta(type):
    def __init__(cls, name, bases, cls_dict):
        cls.came_from_my_meta = 'Oh yeah!'

    def foo(cls):
        return 'foo'

    @property
    def bar(cls):
        return 'bar'

class foo4(object):
    __metaclass__ = my_meta

f4 = foo4()
print f4.came_from_my_meta

print foo4.foo()
print foo4.bar
Oh yeah!
foo
bar

Note that we called foo and bar on the class, not on the instance. Watch what happens if we do:

f4.foo()
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-53-455f5e172045> in <module>()
----> 1 f4.foo()


AttributeError: 'foo4' object has no attribute 'foo'

In vanilla python, you can decorate a method with @classmethod and the decorated method can be called via the class. It turns out that you can also call such classmethods from an instance of the class. This is because of what is known as method resolution order (mro) which is a slightly complicated topic that I won’t bore you with now other than to say that in the case of f4.foo(), f4 looks for the definition of foo in foo4 and its superclasses. In this case, the only superclasses are object. Hence, the lookup fails as the definition is in my_meta (which is not in the mro).

As a result of this, if you want to have methods that are only callable from the class (and not an instance), metaclasses are the obvious way to achive that goal.


Some Motivation

Ok, I’ve shown you some of the mechanics of metaclasses but not much in the way of why you might want to actually use them.

In my last couple of projects I’ve had to implement libraries which deal with representing binary data messages that are sent over the network using a custom protocol. I had to write classes which expose the fields of these messages and methods to take a filled out instance of one of these classes and convert it into a sequence of bytes to be sent over the wire. Also, I needed to write methods to take raw bytes read off the wire and convert them into the appropriate class instance.

Metaclasses saved the day for me in two interesting ways.


The first use I made of them is actually a pretty common use case: registration of newly written classes in some global data-structure for lookup purposes. The challenge is that when you read a bunch of arbitrary bytes it’s not immediately obvious which class corresponds with the bytes. In my case, a ‘message ID’ was present near the beginning of the message so that was the bit of information I needed to decide which class to use.

The obvious answer is to have a dictionary somewhere in the package that maps the ID to the class. Unfortunately, this is yet another thing that needs to be maintained. If the spec changes or a new message gets added or if an ID changes, you had better update that dictionary or the program will fail.

In my case, there were hundreds of these classes and maintaining all of that seemed like a nightmare.

Lets take a look at the solution:

import sys

class MessageMeta(type):

    """
    A metaclass that will register all instances in the
    instance's module in a map called ID_MESSAGE_MAP
    """

    def __init__(cls, name, bases, clsdct):
        mod = sys.modules[cls.__module__]
        if not hasattr(mod, 'ID_MESSAGE_MAP'):
            mod.ID_MESSAGE_MAP = {}
        if hasattr(cls, 'message_id'):
            mod.ID_MESSAGE_MAP[cls.message_id] = cls

As you can see, the metaclass starts by determining the module that the class is in. If the module doesn’t already have an ID_MESSAGE_MAP it adds it and initializes it to an empty dictionary. Next, if the class has an attribute called message_id, it registers the class by making an entry in the ID_MESSAGE_MAP dictionary.

class MessageBase(object):
    __metaclass__ = MessageMeta

class MessageOne(MessageBase):
    message_id = 1

class MessageTwo(MessageBase):
    message_id = 2

print ID_MESSAGE_MAP
{1: <class '__main__.MessageOne'>, 2: <class '__main__.MessageTwo'>, 3: <class '__main__.MessageFour'>}

As you can see, ID_MESSAGE_MAP auto-magically got populated with an ID:class mapping. This completely takes care of the maintenance of this data-structure. Meta-programming for the win!


The second way I used (abused?) metaclasses is a little less orthodox.

I pretty much always use sphinx to generate documentation (if you’ve never used it, do yourself a favor and check it out).

Since I was documenting all the fields in the docstring, listing them in a fields attribute (needed for serialization since I need them in order), listing them in the __init__ method parameters, and in the __init__ body, I realized I had a bad case of DRY violation going on.

Take a look at a typical case:

class MessageThree(MessageBase):

    """
    Message Three is the third of hundreds of messages :(

    :param int first_field: the first field, represents the first thing
    :param int second_field: the second field, represents the second thing
    :param int third_field: the third field, represents the third thing
    """

    message_id = 3

    fields = [
        'first_field',
        'second_field',
        'third_field'
    ]

    def __init__(self, first_field=0, second_field=0, third_field=0):
        self.first_field = first_field
        self.second_field = second_field
        self.third_field = third_field

Nasty right? Unfortunately it gets even worse!

In this protocol, they decided to make everything ints on the wire even if the data was floating point. Enter the ugly scaled integer

The docstrings end up actually looking like this:

:param int first_field: the first field, represents the first thing, scaling: 1e8, units: radians

To deal with the scaling, I had to add that information to the class somehow. I thought about modifying the fields list and make the elements tuples which have the attribute name and the scaling factor. I could then add attributes like _first_field_scale = 1e8 to the class. Then I could make a bunch of properties that look like:

    @property
    def first_field(self):
        return self._first_field / self._first_field_scale

    @first_field.setter
    def first_field(self, val):
        self._first_field = val * self._first_field_scale

This would work, but things are starting to get a little out of hand. Maintaining this mess is starting to look pretty depressing.

I then had the following key insight: all the information in this class is present in the docstring! I realized that I could use the metaclass to write the class body for me!

As insane as this sounds, the implementation is actually pretty succinct:

class MessageMeta(type):

    """
    A metaclass that will register all instances in the
    instance's module in a map called ID_MESSAGE_MAP
    """

    def __init__(cls, name, bases, clsdct):
        mod = sys.modules[cls.__module__]
        if not hasattr(mod, 'ID_MESSAGE_MAP'):
            mod.ID_MESSAGE_MAP = {}
        if hasattr(cls, 'message_id'):
            mod.ID_MESSAGE_MAP[cls.message_id] = cls

        if not cls.__doc__:
            return

        fields = []
        types = []
        # we parse the docstring line by line looking for parameter information.
        for line in cls.__doc__.split('\n'):
            line = line.strip()
            # if the line starts with ':param' we know we the next two tokens
            # are the type and the field name
            if line.startswith(':param'):
                param = line.split()[2].strip(':')
                try:
                    param_type = eval(line.split()[1])
                except NameError:
                    return
                fields.append(param)
                types.append(param_type)
                scaling = 1
                if not hasattr(cls, param):
                    # if a field is a scaled integer, the word
                    # 'scaling' will be in the line
                    if 'scaling' in line:
                        scaling = float(line[line.index('scaling'):]
                                        .split(',')[0].split()[-1])
                        setattr(cls, '_' + param + '_scaling', scaling)
                        setattr(cls, '_' + param, param_type())

                        # we create a computed property with a getter/setter pair
                        # that handles the scaling factor for us.
                        def fget(self, param=param, scaling=scaling):
                            return getattr(self, '_' + param) / scaling

                        def fset(self, value, param=param, 
                                 scaling=scaling, param_type=param_type):
                                 
                            setattr(self, '_' + param,
                                    param_type(value * scaling))

                        setattr(cls, param, property(fget, fset))
                    else:
                        # otherwise we don't need a property,
                        # a simple attribute will do fine
                        setattr(cls, param, param_type())
                # if a field has units associated with it, we store
                # it for use with __str__
                if 'units' in line:
                    attr = line.strip().split()[2].strip(':')
                    units = line[line.index('units'):].split(',')[0].split()[-1]
                    setattr(cls, '_' + attr + '_units', units)
        cls.fields = fields

        # for classes with a non-empty `fields` property, we generate
        # an `__init__` method which has each field as a parameter
        # with a default value and initializes the field to the parameter value.
        if cls.fields:
            param_list = ', '.join(f + '=' + repr(t())
                                   for f, t in zip(fields, types))

            init_body = '\n    '.join('self.' + f + ' = ' + f for f in fields)
            exec(('def __init__(self, {}):\n' +
                  '    {}\n').format(param_list, init_body), clsdct)

            cls.__init__ = clsdct['__init__']

class MessageBase(object):
    __metaclass__ = MessageMeta

Let’s try it out:

class MessageFour(MessageBase):

    """
    Message Four is the fourth of hundreds of messages :(

    :param int first_field: the first field, represents the first thing, scaling: 1e2, units: radians
    :param int second_field: the second field, represents the second thing, scaling: 1e4, units: meters/second
    :param int third_field: the third field, represents the third thing, scaling: 1e3, units: radians/second
    """

    message_id = 3

four = MessageFour(20, 30, 40)
print "first_field:", four.first_field
print "_first_field:", four._first_field
print "_first_field_scaling:", four._first_field_scaling
first_field: 20.0
_first_field: 2000
_first_field_scaling: 100.0

Fantastic! The only thing that would make this better is a good __str__ and __repr__ function. I don’t know about you, but I generally think in degrees rather than radians, feet rather than meters, and Knots rather than meters/second. Lets write something that will give us a useful printout:

from math import pi

def radians_to_degrees(radians):

    """Converts radians to degrees"""

    return radians * 180 / pi


def meters_to_feet(meters):

    """Converts meters to feet"""

    return meters * 3.28084


def mps_to_kts(mps):

    """Converts meters / sec to Knots"""

    return mps / 0.51444444444


class MessageBase(object):
    __metaclass__ = MessageMeta

    def __repr__(self):
        return self.__class__.__name__ + '(' + ', '.join(
            [repr(getattr(self, f)) for f in self.fields]
        ) + ')'

    def __str__(self):

        printvals = []

        for field in self.fields:
            val = getattr(self, field)
            try:
                units = ' ' + getattr(self, '_' + field + '_units')
                if 'radians' in units:
                    units += (' (' + repr(radians_to_degrees(val)) +
                              units.replace('radians', 'degrees') + ')')

                if 'meters/second' in units:
                    units += (' (' + repr(mps_to_kts(val)) +
                              units.replace('meters/second', 'Kts') + ')')

                elif 'meters' in units:
                    units += (' (' + repr(meters_to_feet(val)) +
                              units.replace('meters', 'feet') + ')')
            except AttributeError:
                units = ''

            printvals.append(field + ': ' + repr(val)  + units)

        if not printvals:
            return repr(self)

        return self.__class__.__name__ +  ':\n    ' + '\n    '.join(printvals)

Lets kick the tires and light the fires:

class MessageFive(MessageBase):

    """
    Message Five is the fifth of hundreds of messages :(

    :param int first_field: the first field, represents the first thing, scaling: 1e2, units: radians
    :param int second_field: the second field, represents the second thing, scaling: 1e4, units: meters/second
    :param int third_field: the third field, represents the third thing, scaling: 1e3, units: radians/second
    """

five = MessageFive(50, 60, 70)
print repr(five)
print five
MessageFive(50.0, 60.0, 70.0)
MessageFive:
    first_field: 50.0 radians (2864.7889756541163 degrees)
    second_field: 60.0 meters/second (116.63066954744389 Kts)
    third_field: 70.0 radians/second (4010.7045659157625 degrees/second)

Thats all folks!

If you’ve used metaclasses in your projects, I’d love to hear about what you’ve done.

Leave a Reply

Your email address will not be published. Required fields are marked *