Exploring the Python AST
Whilst working on, and learning about, a python wrapper around a WebAPI it came to the point of writing some tests. Because I want to test my code, and not the underlying API I am learning about mocking. To mock the API (and for checking coverage of the API) it is desirable to have an exhaustive list of all the API endpoints. This list could be obtained from the documentation (scraping) or in this case by looking at the scripts that generate the documentation.
Since the sourcecode of the API is available, I figured it would be interesting to look at all actual endpoint definitions. Perhaps the documentation is incomplete, or certain endpoints are masked/hidden. Looking at the sources, the endpoints are all encoded as decorated functions. The decorator specifies the route as well as certain characteristics (description / status codes). Example:
When assuming a certain coding style, this could probably be done with a handful of lines or even a regex. This becomes problematic if youu want to be able to properly parse any and all (valid) python code. You’ll soon find yourself reinventing the (lexer-)wheel which is already available in Python itsself. Thanks to others, there is a built-in ast module which parses Python source code into an AST. The AST can then be inspected and modified, and even unparsed into source code. So the goal is to write a function that takes some source code (file) and given a decorator, returns a list of functions and their interesting properties which have that decorator applied.
The built-in ast
module allows us to parse arbitrary python code into a tree object as well as dump the tree. The layout of such dumps is unfortunately not very nice. A helpful package here is astunparse
which provides a dump()
method which is nicely indented and helps exploring. Additional benefit is that it also provides an unparse()
method which turns an AST back into source code. We will use this functionality later, et’s have a quick look at a simple example of basic parsing.
<_ast.Module object at 0x10ec9d668>
So the snippet is parsed as an ast.Module object which is the root node of our Abstract Syntax Tree. Let;s have a look at it by using astunparse.dump()
Module(body=[
FunctionDef(
name='my_function',
args=arguments(
args=[arg(
arg='my_argument',
annotation=None)],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]),
body=[
Expr(value=Str(s='My Docstring')),
Assign(
targets=[Name(
id='my_value',
ctx=Store())],
value=Num(n=420)),
Return(value=Name(
id='my_value',
ctx=Load()))],
decorator_list=[Name(
id='my_decorator',
ctx=Load())],
returns=None),
FunctionDef(
name='foo',
args=arguments(
args=[],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]),
body=[Pass()],
decorator_list=[],
returns=None),
FunctionDef(
name='bar',
args=arguments(
args=[],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]),
body=[Pass()],
decorator_list=[
Name(
id='Some_decorator',
ctx=Load()),
Name(
id='Another_decorator',
ctx=Load())],
returns=None),
FunctionDef(
name='baz',
args=arguments(
args=[],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]),
body=[Pass()],
decorator_list=[Attribute(
value=Attribute(
value=Name(
id='MyClass',
ctx=Load()),
attr='subpackage',
ctx=Load()),
attr='my_deco_function',
ctx=Load())],
returns=None)])
Quite a mouthful but thanks to the indentation easy to spot 4 top level FunctionDef
objects which correspond to the four function definitions in our snippet. This is the actual intermediate representation used by the python compiler to generate bytecode. The different types of node you see here are the actual internal building blocks of the Python language and the best reference I could find was in the source code. The tree of objects all inherit from ast.AST and the actual types and their properties can be found in the so called ASDL. The actual grammar of python as defined in the Zephyr Abstract Syntax Definition Language. The grammar file resides in the Python sources at Parser/python.asdl.
We want to look at function definitions which are aptly named FunctionDef in the ASDL and represented as FunctionDef objects in the tree. Looking at the ASDL we see the following deifnition for FunctionDef (reformatted):
FunctionDef(identifier name,
arguments args,
stmt* body,
expr* decorator_list,
expr? returns,
string? docstring)
Which seems to correspond to the structure of the object in the AST as shown in the astunparse dump above. There is some documentation at a place called Green Tree Snakes which explains the components of the FunctionDef object. After looking through it a couple of times, I found it easiest just to use the ASDL directly.
Traversing and inspecting the tree
There are two ways to work with the tree. The easiest is ast.walk(node)
which “Recursively yields all descendant nodes in the tree starting at node (including node itself), in no specified order.” and apparently does so breadth first. Alternatively you can subclass the ast.NodeVisitor
class. This class provides a visit()
method which does a depth first traversal. You can override visit_<Class_Name>
which are called whenever the traversal hits a node of type <Class_Name>
. To show the different order, let’s use them both and print the nodes.
For our purposes we should be able to use the walk method, I find it simpler to use for now and honestly I need to investigate this visitor class more to better understand how to use it. My initial issue is figuring out how to yeild values in a pythonic way, without the use of a global variable. Guessing I could stick the class inside a function but leave it for some other time.
Let’s see what happens if we use the ast.walk(tree)
method to grab those FunctionDef
objects and inspect them in the same way. Using the unparse()
method of astunparse
we can transform it back into source code for extra fun.
Nodetype: FunctionDef <_ast.FunctionDef object at 0x10ec9d6a0>
@my_decorator
def my_function(my_argument):
'My Docstring'
my_value = 420
return my_value
Nodetype: FunctionDef <_ast.FunctionDef object at 0x10ec9d908>
def foo():
pass
Nodetype: FunctionDef <_ast.FunctionDef object at 0x10ec9d9b0>
@Some_decorator
@Another_decorator
def bar():
pass
Nodetype: FunctionDef <_ast.FunctionDef object at 0x10ec9dac8>
@MyClass.subpackage.my_deco_function
def baz():
pass
We wanted to only grab functions who have a certain decorator, so we need to inspect the decorator_list
attribute of the FunctionDef
class and devise some way of filtering based on that attribute. First naive attempt would be:
my_function ['my_decorator']
foo []
bar ['Some_decorator', 'Another_decorator']
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-666517aaaecf> in <module>()
1 for node in ast.walk(tree):
2 if isinstance(node, ast.FunctionDef):
----> 3 decorators = [d.id for d in node.decorator_list]
4 print(node.name, decorators)
<ipython-input-7-666517aaaecf> in <listcomp>(.0)
1 for node in ast.walk(tree):
2 if isinstance(node, ast.FunctionDef):
----> 3 decorators = [d.id for d in node.decorator_list]
4 print(node.name, decorators)
AttributeError: 'Attribute' object has no attribute 'id'
So looking more closely there is a different representation in the AST for a single keyword (@function
) decorator as there is for a compound (@Class.method
). Compare the decorator in my_function
:
decorator_list=[Name(
id='my_decorator',
ctx=Load())]
against the compound decorator in baz
:
decorator_list=[Attribute(
value=Attribute(
value=Name(
id='MyClass',
ctx=Load()),
attr='subpackage',
ctx=Load()),
attr='my_deco_function',
ctx=Load())]
So we need to modify our treewalk to accomodate for this. When the top level element in the decorator_list is of type Name
, we grab the id and be done with it. If it is of type Attribute
we need to do some more extra work. From the ASDL we can see that Attribute is a nested element:
Attribute(expr value, identifier attr, expr_context ctx)
Assuming it’s nested ast.Attribute
s with a ast.Name
at the root we can define a flattening function as follows:
my_function ['my_decorator']
foo []
bar ['Some_decorator', 'Another_decorator']
baz ['MyClass.subpackage.my_deco_function']
Much better, however the actual sources I want to parse have an additional complication, the decorator functions have arguments passed into them. And I want to know what’s in them as well. So let’s switch to some actual source code and see how to do that. I have removed the body of the function as we are only interested in the decorator now.
Module(body=[FunctionDef(
name='list',
args=arguments(
args=[
arg(
arg='request',
annotation=None),
arg(
arg='response',
annotation=None)],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[]),
body=[Pass()],
decorator_list=[Call(
func=Attribute(
value=Name(
id='Route',
ctx=Load()),
attr='get',
ctx=Load()),
args=[Str(s='/projects/{project_id}/snapshots')],
keywords=[
keyword(
arg='description',
value=Str(s='List snapshots of a project')),
keyword(
arg='parameters',
value=Dict(
keys=[Str(s='project_id')],
values=[Str(s='Project UUID')])),
keyword(
arg='status_codes',
value=Dict(
keys=[
Num(n=200),
Num(n=404)],
values=[
Str(s='Snasphot list returned'),
Str(s="The project doesn't exist")]))])],
returns=None)])
We find the decorator_list to contain a ast.Call object rather than a Name or Attribute. This corresponds to the signature of the called decorator function. I am interested in the first positional argument as well as the keyword arguments. Let’s grab the [0]
element of the decorator list to simplify.
Call(
func=Attribute(
value=Name(
id='Route',
ctx=Load()),
attr='get',
ctx=Load()),
args=[Str(s='/projects/{project_id}/snapshots')],
keywords=[
keyword(
arg='description',
value=Str(s='List snapshots of a project')),
keyword(
arg='parameters',
value=Dict(
keys=[Str(s='project_id')],
values=[Str(s='Project UUID')])),
keyword(
arg='status_codes',
value=Dict(
keys=[
Num(n=200),
Num(n=404)],
values=[
Str(s='Snasphot list returned'),
Str(s="The project doesn't exist")]))])
Returning to the original task, let;s try and grab the data that is of actual interest to us:
Route.get /projects/{project_id}/snapshots
Parameters:
project_id: Project UUID
Status Codes:
200: Snasphot list returned
404: The project doesn't exist
That looks perfect, so now to bring it all together and write a function that takes a filename and a decorator as argument and spits out a list of tuples which hold the:
- filename in which the function was found
- Function name (str)
- The name of the decorator
- description for the given decorator (str)
- parameters for the decorator (dict)
- status codes for the decorator (dict)
for every function in the sourcefile which is decorated with that decorator. I have the current working version for parsing some files in the GNS3 source code:
Route.get /computes List of compute servers
Route.get /computes/{compute_id}/{emulator}/images Return the list of images available on compute and controller for this emulator type
Route.get /computes/{compute_id}/{emulator}/{action:.+} Forward call specific to compute node. Read the full compute API for available actions
Route.get /computes/{compute_id} Get a compute server information
Route.post /computes Register a compute server
Route.post /computes/{compute_id}/{emulator}/{action:.+} Forward call specific to compute node. Read the full compute API for available actions
Route.put /computes/{compute_id} Get a compute server information
Route.delete /computes/{compute_id} Delete a compute instance
Route.get /projects/{project_id}/drawings List drawings of a project
Route.post /projects/{project_id}/drawings Create a new drawing instance
Route.put /projects/{project_id}/drawings/{drawing_id} Create a new drawing instance
Route.delete /projects/{project_id}/drawings/{drawing_id} Delete a drawing instance
Route.get /gns3vm/engines Return the list of engines supported for the GNS3VM
Route.get /gns3vm/engines/{engine}/vms Get all the available VMs for a specific virtualization engine
Route.get /gns3vm Get GNS3 VM settings
Route.put /gns3vm Update GNS3 VM settings
Route.get /projects/{project_id}/links List links of a project
Route.get /projects/{project_id}/links/{link_id}/pcap Stream the pcap capture file
Route.post /projects/{project_id}/links Create a new link instance
Route.post /projects/{project_id}/links/{link_id}/start_capture Start capture on a link instance. By default we consider it as an Ethernet link
Route.post /projects/{project_id}/links/{link_id}/stop_capture Stop capture on a link instance
Route.put /projects/{project_id}/links/{link_id} Update a link instance
Route.delete /projects/{project_id}/links/{link_id} Delete a link instance
Route.get /projects/{project_id}/nodes/{node_id} Update a node instance
Route.get /projects/{project_id}/nodes List nodes of a project
Route.get /projects/{project_id}/nodes/{node_id}/dynamips/auto_idlepc Compute the IDLE PC for a Dynamips node
Route.get /projects/{project_id}/nodes/{node_id}/dynamips/idlepc_proposals Compute a list of potential idle PC for a node
Route.get /projects/{project_id}/nodes/{node_id}/files/{path:.+} Get a file in the node directory
Route.post /projects/{project_id}/nodes Create a new node instance
Route.post /projects/{project_id}/nodes/start Start all nodes belonging to the project
Route.post /projects/{project_id}/nodes/stop Stop all nodes belonging to the project
Route.post /projects/{project_id}/nodes/suspend Suspend all nodes belonging to the project
Route.post /projects/{project_id}/nodes/reload Reload all nodes belonging to the project
Route.post /projects/{project_id}/nodes/{node_id}/start Start a node instance
Route.post /projects/{project_id}/nodes/{node_id}/stop Stop a node instance
Route.post /projects/{project_id}/nodes/{node_id}/suspend Suspend a node instance
Route.post /projects/{project_id}/nodes/{node_id}/reload Reload a node instance
Route.post /projects/{project_id}/nodes/{node_id}/files/{path:.+} Write a file in the node directory
Route.put /projects/{project_id}/nodes/{node_id} Update a node instance
Route.delete /projects/{project_id}/nodes/{node_id} Delete a node instance
Route.get /projects List projects
Route.get /projects/{project_id} Get a project
Route.get /projects/{project_id}/notifications Receive notifications about projects
Route.get /projects/{project_id}/notifications/ws Receive notifications about projects from a Websocket
Route.get /projects/{project_id}/export Export a project as a portable archive
Route.get /projects/{project_id}/files/{path:.+} Get a file from a project. Beware you have warranty to be able to access only to file global to the project (for example README.txt)
Route.post /projects Create a new project on the server
Route.post /projects/{project_id}/close Close a project
Route.post /projects/{project_id}/open Open a project
Route.post /projects/load Open a project (only local server)
Route.post /projects/{project_id}/import Import a project from a portable archive
Route.post /projects/{project_id}/duplicate Duplicate a project
Route.post /projects/{project_id}/files/{path:.+} Write a file to a project
Route.put /projects/{project_id} Update a project instance
Route.delete /projects/{project_id} Delete a project from disk
Route.get /version Retrieve the server version number
Route.get /settings Retrieve gui settings from the server. Temporary will we removed in later release
Route.post /shutdown Shutdown the local server
Route.post /version Check if version is the same as the server
Route.post /settings Write gui settings on the server. Temporary will we removed in later releas
Route.post /debug Dump debug informations to disk (debug directory in config directory). Work only for local server
Route.get /symbols List of symbols
Route.get /symbols/{symbol_id:.+}/raw Get the symbol file
Route.post /symbols/{symbol_id:.+}/raw Write the symbol file
This can be used as input for generating the mocked API calls, checking coverage etc etc. And more in general, I think this is a very powerful tool to analyze python code / do some meta-programming and now at least I know a little bit about how to use it.