深入剖析Generator

2018-05-16

字数统计 851字 | 阅读时长 4分钟

Generator是如何工作的？

在剖析Generator之前，不妨先来回顾一下Python中普通函数是如何工作的。正常情况下，当Python主函数调用一个子函数时，子函数从主函数那里获得cpu控制权直到子函数结束或触发异常，然后主函数重新获得cpu控制权，继续执行后续任务。
以上我们可以通过查看Python字节码来说明。首先我们创建如下两个函数

>>> def foo():
	a = 1
	bar()

>>> def bar():
	pass

通过’反汇编’获得字节码:

>>> import dis
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (1)
              2 STORE_FAST               0 (a)

  3           4 LOAD_GLOBAL              0 (bar)
              6 CALL_FUNCTION            0
              8 POP_TOP
             10 LOAD_CONST               0 (None)
             12 RETURN_VALUE

foo函数首先加载bar到它的栈帧并调用(CALL_FUNCTION)它，然后把bar的返回值从栈中弹出，加载None到堆栈并返回。
有一点很重要：Python的栈帧实际上时分配在堆中的!Python解释器是用标准C语言实现的，它的栈帧是正常的栈帧(分配在栈空间)，而Python的栈帧是在堆中处理的，这就意味着Python中的栈帧在函数调用结束后依然可以存在，下面我们来看看这种现象：

>>> import inspect
>>> frame = None
>>> def foo():
	a = 1
	bar()

	
>>> def bar():
	global frame
	frame = inspect.currentframe()

	
>>> foo()
>>> # The frame was executing the code for 'bar'.
>>> frame.f_code.co_name
'bar'
>>> # Its back pointer refers to the frame for 'foo'.
>>> caller_frame = frame.f_back
>>> caller_frame.f_code.co_name
'foo'

函数调用

接下来看看Generator。下面是一个Generator函数

>>> def gen_fn():
	result = yield 1
	print('result of yield: {}'.format(result))
	result2 = yield 2
	print('result of 2nd yield: {}'.format(result2))
	return 'done'

Python规定：当一个函数中含有yield语句时，该函数会被当成是Generator函数，那么Python解释器是怎么实现这一点的呢？
实际上，当gen_fn函数编译为字节码时，遇到yield语句，解释器知道这是一个Generator函数，于是把对应的标志位(是的！Python是根据Generator标志位来表明该函数是一个Generator函数)。
可以看一下gen_fn函数对应的Generator标志位如下

>>> # The generator flag is bit position 5.
>>> generator_bit = 1 << 5
>>> bool(gen_fn.__code__.co_flags & generator_bit)
True

所有对gen_fn的调用都指向同一个代码空间，但是每一个调用都有自己的栈帧，这些栈帧并非真正意义上的栈，而是上述所讲Python中的栈，它们时分配在堆空间的。

上图可以看到在Generator的栈帧中包含一个’f_lasti’的指针，改指针用于指向最后执行指令的位置(字节码中)，初始化为-1

1
2
3

>>> gen = gen_fn()
>>> gen.gi_frame.f_lasti
-1

第一次调用send，Generator到达第一个yield并且暂停，send返回值为1。此时’f_lasti’指向第一个yield在字节码中的位置2。
第二次调用send，Generator到达第二个yield并且暂停，send返回值为2。此时’f_lasti’指向第二个yield在字节码中的位置22。

>>> gen.send(None)
1
>>> gen.gi_frame.f_lasti
2
>>> gen.send(None)
result of yield: None
2
>>> gen.gi_frame.f_lasti
22

可以看到，Generator函数可以随时随地被任何函数暂停，这是因为其栈帧并非真正意义上的栈空间：它是分配在堆空间的。

译自：a-web-crawler-with-asyncio-coroutines