Hacking Python: Adding A New Keyword
March 28, 2011 | View Comments
As you may know Python and Ruby are very similar languages of course there are huge differences but they're influenced from the same ideas but every language has unique keywords such as Python's pass.
For example, in Ruby, there are some keywords which aren't in Python such as unless and end. I thought it'd be fun to implement these keywords to Python (I'm not suggesting that such keywords should exist in Python). I've decided to implement the unless keyword of Ruby to Python.
How "unless" works
Here's a simple example how unless works in Ruby;
def test(n) return n unless n < 6 end
Ruby has a very flexible syntax so this one is also valid;
def test(n) unless n < 6 return n end end
Let's Start Hacking Python
Here we go. We're gonna implement Ruby's unless keyword to Python by following the instructions of PEP-0339^3.
- Go to python.org and download the Python 2.7.1 source code.
- Extract it to somewhere and go into that folder.
- ????
- Profit.
Python's Parser Tree is an LL Parser^1 which puts the source code to a parse tree to define the grammar rules. The reason behind this action is so compiler can traverse the tree. The grammar file of Python is in Grammar/Grammar file. In this file we see all the grammar rules.
For example;
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
can be a rule for;
for k, v in mydict.items(): print v else: print 'a message'
We need to define the rule for our unless keyword. After if_stmt rule, write the following;
unless_stmt: 'unless' test ':' suite ['else' ':' suite]
We need to put unless_stmt that we've just defined to compound_stmt list so put it. For example;
compound_stmt: if_stmt | unless_stmt | ....
Abstract Syntax Trees(AST)
ASTs are a tree structure to represent the abstract syntax structure of the source code of a programming language^2.
ASTs create a clean interface between the parser and the compiler and make it ease to compile. Java, Python, C# (technically all modern languages) take advantage of ASTs but legacy languages such as C couldn't take the advantage of them since memory was a big issue.
I'm finishing ASTs short so I'm not going to say anything about how to track the position, or doing the parsing and analysis of semantics, maybe in another post.
The definition of AST nodes for Python is found in Parser/Python.asdl^3. Open that file and find the part of for, while, if, with and write the following there;
| Unless(expr test, stmt* body, stmt* orelse)
The AST tree that is generated is in Python/ast.c. Open that file and search for ast_for_stmt function where you'll see the functions that are called according to their statement types. (Remember, our unless_stmt is in compound_stmt), and find the switch/case where the if_stmt, while_stmt... are defined(basically the statements of compound_stmt) and add the following line
case unless_stmt: return ast_for_unless_stmt(c, ch);
After that, we need to define the ast_for_unless_stmt function which is the following function;
static stmt_ty ast_for_unless_stmt(struct compiling *c, const node *n) { /* unless_stmt: 'unless' test ':' suite ['else' ':' suite] */ char *s; REQ(n, unless_stmt); s = STR(CHILD(n, 4)); if (NCH(n) == 4) { expr_ty expression; asdl_seq *suite_seq; expression = ast_for_expr(c, CHILD(n, 1)); if (!expression) return NULL; suite_seq = ast_for_suite(c, CHILD(n, 3)); if (!suite_seq) return NULL; return Unless(expression, suite_seq, NULL, LINENO(n), n->n_col_offset, c->c_arena); } if (s[2] == 's') { expr_ty expression; asdl_seq *seq1, *seq2; expression = ast_for_expr(c, CHILD(n, 1)); if (!expression) return NULL; seq1 = ast_for_suite(c, CHILD(n, 3)); if (!seq1) return NULL; seq2 = ast_for_suite(c, CHILD(n, 6)); if (!seq2) return NULL; return Unless(expression, seq1, seq2, LINENO(n), n->n_col_offset, c->c_arena); } PyErr_Format(PyExc_SystemError, "unexpected token in 'unless' statement: %s", s); return NULL; }
Once we've created the AST, the next step is to create the CFG (Control Flow Graph). Just read the 4th part, which I'm using to do this, to understand CFG.
To create the CFG, open the file Python/compile.c and add the following function somewhere (Just follow the structure of compound_stmt: to make the things easier).
static int compiler_unless(struct compiler *c, stmt_ty s) { basicblock *end, *next; int constant; assert(s->kind == Unless_kind); end = compiler_new_block(c); if (end == NULL) return 0; constant = expr_constant(s->v.Unless.test); /* constant = 0: "if 0" * constant = 1: "if 1", "if 2", ... * constant = -1: rest */ if(constant == 0) { VISIT_SEQ(c, stmt, s->v.Unless.orelse); } else if(constant == 1) { VISIT_SEQ(c, stmt, s->v.Unless.body); } else { if(s->v.Unless.orelse) { next = compiler_new_block(c); if(next == NULL) return 0; } else { next = end; } VISIT(c, expr, s->v.Unless.test); ADDOP_JABS(c, POP_JUMP_IF_TRUE, next); VISIT_SEQ(c, stmt, s->v.Unless.body); ADDOP_JREL(c, JUMP_FORWARD, end); if(s->v.Unless.orelse) { compiler_use_next_block(c, next); VISIT_SEQ(c, stmt, s->v.Unless.orelse); } } compiler_use_next_block(c, end); return 1; }
You'll notice this is very similar to compiler_if function but unless and if are very similar so I almost copy/pasted the compiler_if function :)
So, that's it. just run ./configure && make and you'll have a custom built Python.
Here's a sample usage of our unless keyword;
def test(n): unless n < 4: print n else: print 'a message'
References
- LL Parser, http://en.wikipedia.org/wiki/LL_parser
- AST, http://en.wikipedia.org/wiki/Abstract_syntax_tree
- PEP-0339, http://www.python.org/dev/peps/pep-0339/
- PEP-0339, http://www.python.org/dev/peps/pep-0339/#ast-to-cfg-to-bytecode