Let’s talk about the
yield keyword in python.
So what is
yield all about? Let’s fire up an IPython Read-Evaluate-Print-Loop (REPL) and dive right in with an example:
In : def yielder(): ...: i = 5 ...: while i > 0: ...: yield i ...: i -= 1
So what happens when we call
In : yielder() Out: <generator object yielder at 0x10519b5f0>
Interesting. So calling
yielder didn’t really do anything, but it returned a
generator instance, whatever that is.
Ok, let’s assign this to a variable and take a look at what we got:
In : y = yielder()
now, if we type
y. and hit tab, we can see the public (i.e. doesn’t start with underscore) methods:
In : y. y.close y.gi_running y.throw y.gi_code y.next y.gi_frame y.send
For today’s post, I’m just going to talk about
next. You might not ever need to know about
gi_frame. The methods
throw are more advanced coroutine topics that I’ll skip for now.
next method is primarily used with generators.
Here’s what happens when you call
In : y.next() Out: 5
Ok, cool! Now we’re getting somewhere. Let’s call that a few more times:
In : y.next() Out: 4 In : y.next() Out: 3 In : y.next() Out: 2 In : y.next() Out: 1 In : y.next() --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-9-75a92ee8313a> in <module>() ----> 1 y.next() StopIteration:
It turns out that the
yield keyword (as in other languages) is deeply connected with iteration. Calling
next on the generator ‘generates’ the next value until there aren’t any more values to generate. In python, the
StopIteration exception is raised to indicate that the generator has exhausted its values.
We can actually use the
for syntax to drive our generator:
In : for i in yielder(): ...: print i ...: 5 4 3 2 1
You can have as many yield statements as you want in a generator function:
In : def sg1(): ...: yield 'Jack' ...: yield 'Dan' ...: yield 'Sam' ...: yield "Teal'c" ...:
You can also convert a generator to a list:
In : list(sg1()) Out: ['Jack', 'Dan', 'Sam', "Teal'c"]
It’s possible to construct infinite ‘lists’ using generators:
In : def evens(): ...: i = 0 ...: while True: ...: yield i ...: i += 2 ...:
Just don’t call
Instead, you can implement some of the Haskell builtins 😉
Let’s just do
take for now:
In : from itertools import islice In : take = lambda n, gen: list(islice(gen(), 0, n)) In : take(10, evens) Out: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
One of the most compelling reasons to use generators is that you can use them to process very large datasets in a constant amount of memory. If you don’t know about generators, you might be tempted to make a list and keep appending your data to it and then do some post processing. That work-flow falls apart when you are dealing with terabytes of data. Generators can realy save your bacon in these situations.
In fact, generators are so useful, Python provides an alternate list comprehension syntax that gives you a generator rather than a list. It’s just like a list comprehension, except you use parenthesis!
Consider this list:
In : my_list = [x**2 for x in range(10)] In : my_list Out: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Now the generator version:
In : my_gen = (x**2 for x in range(10)) In : my_gen Out: <generator object <genexpr> at 0x11d290be0> In : for i in my_gen: print i 0 1 4 9 16 25 36 49 64 81
Generators have first class support in python. Take a look at the itertools built-in module.
Before we move on to coroutines, I want to make one final observation. When you call a generator function, nothing happens. What I mean by that is that none of the lines of code you write in the body of your function are run:
In : def yielder2(): ...: print "I'm a generator!" ...: yield ...: In : y = yielder2() In : y.next() I'm a generator!
Note that nothing happened until we called
next. The takeaway is that if you include the keyword
yield anywhere in your function (or method), your function wont run when you call it! To get anything to run, you must call
next. It gets even more wild when you consider that when you call
next, the function will only run until the first
yield. Subsequent calls to
next will run your function to the next
yield and so on.
You’ve converted your function into an entirely different beast. This seemed deeply strange to me at first, but it turns out to be very powerful. Basically, the
yield statement pauses the execution of your function. It is resumed by calling
next. This actually opens up a whole new programming paradigm called coroutines.
While the most common use case for generators is iteration, you can also use the
yield statement to pause and resume execution of your functions. This can allow you to build a state-machine out of simple functions. You can think of
yield as offering you a way to specify alternate entry and exit points for your program.
If we think of
yield this way, one of the first things that may come to mind is that so far we have seen a way to return values, but be haven’t seen a way to pass values into our generators. Enter the
In : def my_coro(): ...: print 'My first co-routine!' ...: print 'Send me a value ...' ...: while True: ...: val = yield ...: print val ...: In : c = my_coro() In : c Out: <generator object my_coro at 0x11d290a00> In : c.next() My first co-routine! Send me a value ... In : c.send('spam') spam In : c.send('eggs') eggs
A couple things to note:
As you can see, this is still a generator object. Furthermore, we still need to ‘warm it up’ by calling
next. Only then can we start calling it with
send (actually, we could have done
send(None) instead of
Pretty cool right?
So what should you do with generator/coroutines? Well that’s completely up to you, the sky is the limit. However, one of the poster child uses of coroutines is thread-less concurrency.
People usually use threads (and multi-processing) for one of two reasons: parallelism or concurrency.
Parallelism is using multiple CPU cores to speed up a computation–i.e. you split a problem into chunks that can be computed independently, compute them on separate threads (or processes or even separate machines), and finally combine the results of the computations into the final result.
It turns out that the Global Interpreter Lock (GIL) prevents multiple threads from running simultaneously so in python, parallelism is typically done via multiprocessing.
Concurrency on the other hand is having multiple tasks executing simultaneously. The distinction is subtle but important. Consider the bad old days when your computer had only one CPU core. You could run multiple programs at once just fine. That’s concurrency. Note that parallelism is impossible on a single core machine.
In python, the threading module is often used for concurrency. However, threading has some overhead associated with it. If you are running a server and you want it to handle thousands of simultaneous network connections, your application will suffer if you decide to implement it by spawning thousands of threads (or even worse, processes). This is where coroutines really shine.
It is possible to (and many frameworks do) use coroutines to implement a single threaded cooperative multitasking environment by having an event-loop which manages the execution of tasks (coroutines) by calling the
next method on each coroutine that is in the “I’m ready to do work bin”. The task is then either done (
StopIteration) or it puts it self back in the available queue by
There are a couple gotchas when it comes to using coroutines for multitasking. First and foremost, if any of your coroutines blocks, your whole app comes to a grinding halt. So your app needs to use non-blocking functions everywhere. Second, this is cooperative multitasking. i.e. not preemptive, so there is no mechanism for the task scheduler to switch tasks unless the currently running task calls
yield. If a task blocks, there isn’t a way to switch—see the first gotcha.
One final note:
I’ve been calling
generator_instance.next() in this post. It turns out that since python 2.6, there is a builtin
next that you should call instead. e.g.
next(genertor_instance). This is to align the generator syntax with some of python’s other protocols. In python 3.*, there is no
next method. Instead there is a
__next__ method. Calling
next(generator_instance) will always do the right thing and it’s the way you should be calling it.
Hopefully this added another tool to your toolbox.