When I’m making a python package, I usually write an
__init__.py which lifts the packages modules into the package namespace.
Assume you have a package like this:
my_package/ __init__.py module1.py module2.py module3.py
If you leave the
__init__.py blank, you cannot do:
import my_package my_package.module1
This will result in an error that looks like:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-2-dc2aca417ab9> in <module>() ----> 1 my_package.module1 AttributeError: module 'my_package' has no attribute 'module1'
Doing this also fails in the same way:
from my_package import * module1
This is because the
module1 symbol is not in the
my_package namespace. That is one of the main purposes of the
__init__.py. We can arbitrarily control the package namespace by importing symbols. Additionally, we can control what symbols are imported when someone does
from my_package import * by listing the symbols (as strings) in a variable called
Simply importing the sub-modules in the
__init__.py solves the problem:
# my_package.__init__.py from . import module1 from . import module2 from . import module3
This is fine for small packages, but what if there are hundreds of sub-modules? Or, what if loading the modules has significant overhead? It may make sense to not lift the modules into the package namespace. You often see this in large packages. The rational is easy to understand, but I got to wondering if there was a way to have the benefit of having the symbols lifted, and avoid the overhead of actually importing the modules.
A natural solution would be to use properties. If you’ve never used these, here is a crash course:
class Foo(object): @property def bar(self): return 42
If you make an instance of
Foo you can access
bar without function application syntax:
f = Foo() print(f.bar)
42. Note that we call
bar without parenthesis. In fact,
f.bar() results in an error because
int instances are not callable. The
@property mechanism is often used to implement computed properties that in other languages would need to be in getter/setter functions.
Back to the problem at hand. We would like to expose the symbols at the package level, but delay the actual import until the symbol is accessed. Using properties, this would look something like:
@property def module1(): from . import module1 as _module1 return _module1
Unfortunately, this does not work. This is because
@properties are designed to work with class instances, not classes. When we try to call
module1, instead of the function running, we get this:
In : module1 Out: <property at 0x1058a8cc8>
We got a
property instance. The property needs to be bound to an instance in order to run.
It’s starting to look like you can’t have properties on modules. 🙁
Except that you can! You just have to be willing to go a little off the beaten path.
As I’ve said in previous posts, everyting in python is an object. We will use this fact to bend python to our will.
Specifically, we can extend the module object type to do what we want:
from types import ModuleType class MyModule(ModuleType): @property def module1(self): from . import module1 as _module1 return _module1 @property def module2(self): from . import module2 as _module2 return _module1 @property def module3(self): from . import module3 as _module3 return _module1
Now, if we make an instance of
MyModule, those properties should work as expected. Additionally, we will need to replace the existing module in the cache with an instance of
import sys sys.modules[__name__] = MyModule(__name__)
This is very close to a working solution, but there are still some missing pieces.
This code touches on one of the areas which are significantly different between python2 and python3. Much of the import mechanisms have been moved from C code in the interpreter to python code that you can inspect and interact with. This results in subtle differences in what happens when you do an import. Here is some code that, so far as I can tell, runs the same in both versions of the language:
import sys import importlib from types import ModuleType class MyModule(ModuleType): @property def module1(self): if not self.__dict__.get('module1'): self.__dict__['module1'] = importlib.import_module('.module1', __package__) return self.__dict__['module1'] @module1.setter def module1(self, mod): self.__dict__['module1'] = mod # and so on for all the modules old = sys.modules[__name__] new = MyModule(__name__) new.__path__ = old.__path__ for k, v in list(old.__dict__.items()): new.__dict__[k] = v sys.modules[__name__] = new
It turns out that in python3, we need a setter because deep in the python code that performs the import it does a
setattr on the module, while in the python2 version it relies on the builtin
__import__ function which seems to manipulate the module
There is one last element that could use some improvement.
When I’m developing, I spend a lot of time in the REPL (I prefer iPython) and I rely on tab completion. Since
@property methods don’t end up in the class
__dict__ the sub-modules don’t auto-complete on tab. Let’s fix that.
import sys import importlib from types import ModuleType class MyModule(ModuleType): def __init__(self, name): super(MyModule, self).__init__(name) self.module1 = None self.module2 = None self.module3 = None def __getattribute__(self, attr): val = object.__getattribute__(self, attr) if val is None: try: ret = object.__getattribute__(self, '_' + attr) except AttributeError: return None setattr(self, attr, ret) return ret return val @property def _module1(self): if not self.__dict__.get('module1'): self.__dict__['module1'] = importlib.import_module('.module1', __package__) return self.__dict__['module1'] # and so on for all the modules old = sys.modules[__name__] new = MyModule(__name__) new.__path__ = old.__path__ for k, v in list(old.__dict__.items()): new.__dict__[k] = v sys.modules[__name__] = new
Note that since we re-named the
@propertys, we no longer need the setters.
This is a python2/3 compatible recipe for lazy module loading while maintaining the semantics of eager loading (and symbol lifting) in a package
Thank you so much for sharing. It’s an amazing read.
I was just looking for possibilities to implement lazy loading in python.
I came across this as well, https://medium.com/better-programming/how-to-create-lazy-attributes-to-improve-performance-in-python-b369fd72e1b6.
Based on reading both the blog posts, I was wondering, if even getattr should be overloaded in MyModule, to avoid repeated loading for modules.
Glad you liked it!