Faruk Akgul

Hacking Python: Adding A New Keyword

March 28, 2011 | View Comments

As you may know Python and Ruby are very similar languages of course there are huge differences but they're influenced from the same ideas but every language has unique keywords such as Python's pass.

For example, in Ruby, there are some keywords which aren't in Python such as unless and end. I thought it'd be fun to implement these keywords to Python (I'm not suggesting that such keywords should exist in Python). I've decided to implement the unless keyword of Ruby to Python.

How "unless" works

Here's a simple example how unless works in Ruby;

def test(n)
  return n unless n < 6
end

Ruby has a very flexible syntax so this one is also valid;

def test(n)
  unless n < 6
    return n
  end
end

Let's Start Hacking Python

Here we go. We're gonna implement Ruby's unless keyword to Python by following the instructions of PEP-0339^3.

  1. Go to python.org and download the Python 2.7.1 source code.
  2. Extract it to somewhere and go into that folder.
  3. ????
  4. Profit.

Python's Parser Tree is an LL Parser^1 which puts the source code to a parse tree to define the grammar rules. The reason behind this action is so compiler can traverse the tree. The grammar file of Python is in Grammar/Grammar file. In this file we see all the grammar rules.

For example; for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]

can be a rule for;

for k, v in mydict.items():
  print v
else:
  print 'a message'

We need to define the rule for our unless keyword. After if_stmt rule, write the following;

unless_stmt: 'unless' test ':' suite ['else' ':' suite]

We need to put unless_stmt that we've just defined to compound_stmt list so put it. For example;

compound_stmt: if_stmt | unless_stmt | ....

Abstract Syntax Trees(AST)

ASTs are a tree structure to represent the abstract syntax structure of the source code of a programming language^2.

ASTs create a clean interface between the parser and the compiler and make it ease to compile. Java, Python, C# (technically all modern languages) take advantage of ASTs but legacy languages such as C couldn't take the advantage of them since memory was a big issue.

I'm finishing ASTs short so I'm not going to say anything about how to track the position, or doing the parsing and analysis of semantics, maybe in another post.

The definition of AST nodes for Python is found in Parser/Python.asdl^3. Open that file and find the part of for, while, if, with and write the following there;

| Unless(expr test, stmt* body, stmt* orelse)

The AST tree that is generated is in Python/ast.c. Open that file and search for ast_for_stmt function where you'll see the functions that are called according to their statement types. (Remember, our unless_stmt is in compound_stmt), and find the switch/case where the if_stmt, while_stmt... are defined(basically the statements of compound_stmt) and add the following line

case unless_stmt:
    return ast_for_unless_stmt(c, ch);

After that, we need to define the ast_for_unless_stmt function which is the following function;

static stmt_ty
ast_for_unless_stmt(struct compiling *c, const node *n)
{
    /*
      unless_stmt: 'unless' test ':' suite ['else' ':' suite]
     */
    char *s;

    REQ(n, unless_stmt);

    s = STR(CHILD(n, 4));

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;

        return Unless(expression, suite_seq, NULL, LINENO(n), n->n_col_offset,
                  c->c_arena);
    }

    if (s[2] == 's') {
        expr_ty expression;
        asdl_seq *seq1, *seq2;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        seq1 = ast_for_suite(c, CHILD(n, 3));
        if (!seq1)
            return NULL;
        seq2 = ast_for_suite(c, CHILD(n, 6));
        if (!seq2)
            return NULL;

        return Unless(expression, seq1, seq2, LINENO(n), n->n_col_offset,
                  c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "unexpected token in 'unless' statement: %s", s);
    return NULL;
}

Once we've created the AST, the next step is to create the CFG (Control Flow Graph). Just read the 4th part, which I'm using to do this, to understand CFG.

To create the CFG, open the file Python/compile.c and add the following function somewhere (Just follow the structure of compound_stmt: to make the things easier).

static int
compiler_unless(struct compiler *c, stmt_ty s)
{
    basicblock *end, *next;
    int constant;
    assert(s->kind == Unless_kind);
    end = compiler_new_block(c);
    if (end == NULL)
        return 0;

    constant = expr_constant(s->v.Unless.test);
    /* constant = 0: "if 0"
     * constant = 1: "if 1", "if 2", ...
     * constant = -1: rest */
    if(constant == 0) {
      VISIT_SEQ(c, stmt, s->v.Unless.orelse);
    }
    else if(constant == 1) {
      VISIT_SEQ(c, stmt, s->v.Unless.body);
    } else {
      if(s->v.Unless.orelse) {
        next = compiler_new_block(c);
        if(next == NULL) return 0;
      } else {
        next = end;
      }
      VISIT(c, expr, s->v.Unless.test);
      ADDOP_JABS(c, POP_JUMP_IF_TRUE, next);
      VISIT_SEQ(c, stmt, s->v.Unless.body);
      ADDOP_JREL(c, JUMP_FORWARD, end);
      if(s->v.Unless.orelse) {
        compiler_use_next_block(c, next);
        VISIT_SEQ(c, stmt, s->v.Unless.orelse);
      }
    }
    compiler_use_next_block(c, end);
    return 1;
}

You'll notice this is very similar to compiler_if function but unless and if are very similar so I almost copy/pasted the compiler_if function :)

So, that's it. just run ./configure && make and you'll have a custom built Python.

Here's a sample usage of our unless keyword;

def test(n):
  unless n < 4:
    print n
  else:
    print 'a message'

References

  1. LL Parser, http://en.wikipedia.org/wiki/LL_parser
  2. AST, http://en.wikipedia.org/wiki/Abstract_syntax_tree
  3. PEP-0339, http://www.python.org/dev/peps/pep-0339/
  4. PEP-0339, http://www.python.org/dev/peps/pep-0339/#ast-to-cfg-to-bytecode
Share:Tweet · reddit

blog comments powered by Disqus