July 12, 2012 | View Comments
For the last couple of months, I was part of a small team (only 10 people) for a project. This article is a summary of what I learnt what I still didn't learn what I should do in future and the reader may have some ideas as well.
What we did
I can't talk much about it much but the big picture was to analyze and visualize data sets. It wasn't a some sort of a blog engine that 99.99% of Django applications are about (That's the part I salute Cal Henderson).
It was a Python application with the following technologies;
- Mako templates
- and some C sauce
We couldn't choose Django, web2py or any other macro-framework since the application we built wouldn't be suitable for them. It wasn't a web application, it was simply a web service to analyze and visualize data sets. For example;
- Django's ORM would literally cry if we ever wanted to use it and its forms modules would be useless for us.
- I've never used web2py so I didn't want to touch something I don't know and since it's not really small and I was afraid learning curve might be too long.
By looking at these options we had 3 alternatives on the table;
- web.py: The ultimate web framework that used to power reddit.
- Flask: New Python micro framework that relies on Werkzeug.
- bottle: The web framework that's only 1 file.
- brubeck: The most interesting framework I've seen for a quite long time.
Why not Flask?
If you bother to read the rest of the article, you'll notice that I mention "strict regex rules" somewhere.
What I want: Regex based URLs
Flask provides helpers such as
int/float but they're useless in our application. We don't have integers in URLs nor names. We have UUIDs and bunch of other things.
The Problem With Flask
Maybe it's my ignorance but after searching the documents and the web, it's come to my attention that this is what I need to do if I want REGEX URLs in Flask:
First I need to create a Converter class:
from werkzeug.routing import BaseConverter class Converter(BaseConverter): def __init__(self, mapper, *args): super(Converter, self).__init__(mapper) self.regex = args
I've no clue why I needed this in the first place. I know Flask relies on Werkzeug but at the end of the day if I wanted to do something with Flask, I had to import a 3rd party library. This doesn't look very right to me and as far as I know other frameworks I've coded such as Django, Brubeck and Google App Engine's webapp don't have this kind of thing. After defining this converter class, this is what I need to do;
app.mapper.converters['regex'] = Converter
After this, I could write REGEX URLs such as;
Maybe I looked at the wrong documentation but I really shouldn't needed to do all of these. In my point of view, this is a horror story. If you look at Django, web.py, Google App Engine's webapp or Brubeck you just write your REGEX URLs.
There's an extension for Flask to support Mako but I do remember I had run into some problems with it (and I ended up writing my own mako integration) and a quick search on StackOverFlow showed me that I wasn't the only one who had problems.
I didn't look at the rest of features Flask offers after these.
If I look at the bright side; Flask has an amazing documentation and is downloadable. Downloadable documentation is useful if you code in some place that you don't have Internet access. I wish more libraries would do that.
However, I had noticed Flask is good enough for creating small applications but I wouldn't touch it for creating complex applications.
Why not bottle?
I wanted class based controllers and as far as I know bottle doesn't provide such functionality. If you do bottle, please forgive my ignorance!
Why not Brubeck?
Brubeck relies on mongrel2 and some other cool technologies however we had to use Apache (we didn't have any alternatives and it wasn't our call to make a decision).
web.py gives you freedom and does what you say. The thing I like about it is its flexibility thus it was very suitable for us. However, it wasn't all that good. I don't like a thing in web.py: URL definitions.
URL definitions in web.py are defined as in string pairs. For example;
urls = (r'/hello', Hello, r'/world', World)
I think this isn't very right. In my point of view, this looks more beautiful;
urls = [(r'/hello', Hello), (r'/world', World)]
Apparently I'm not the only one who complains about this.
However, web.py met all the mandatory conditions we looked for so it was suitable for us. If you're interested in more about why web.py, you may want to read the follow up article.
The Big 3: Development, Source Control, Bug Tracking
The cycle of an application is;
Development is the most important part since it's a continuos process. You need to select the right tools for the job. Some rules we followed:
- If a code can't pass PEP8*, I don't want to see it in codebase. I don't care if it works.
- If PyLint scores the code less than 8/10, I don't want to see it. I don't care if it works.
- If there's no comment, that code has no place in codebase.
- If you introduce a new method and you didn't write the necessary test for it, I don't want to see it.
- Don't forget the clone detection and code execution order analysis.
*: We used a slightly modified PEP8.
I'm an emacs user and I do development in emacs when it comes to Python, C, C++, Cilk++, Erlang etc. I use an IDE (IntelliJ IDEA) only for Java development.
However, I had always wanted to try PyCharm and for this project I used PyCharm.
A Python IDE from JetBrains. It's just great. I loved it even more when it did
coverage automatically. My only criticise about PyCharm is themes. There should be a way to import TextMate themes (if there's a way, I don't know).
And the price is bargain if you consider what it can do. There are some criticism about PyCharm being slow but I didn't have any problems.
The only problem I saw was it takes a little bit more time to load the project if you have lots of files and maybe it's my ignorance but I couldn't see a way to disable loading the files on some folders. It'd be nice if there was a setting to ignore loading the files on startup on some certain folders. This is a serious issue if you have more than 280,000 log files in your log folder. In this case PyCharm never loads the project and you give up and close it.
PyCharm - CoffeeScript Error Checking
Talk to me about Testing
Yes, all the cool kids use Selenium which is good enough and integrates with nose well. It automates all the tests. Takes screenshots when necessary. Taking screenshot is quite useful I think since it lets you see what a user experience. The visible error message are very useful too.
For example consider the following piece of Selenium test code;
test_get_something: Goes to the URL and finds the text named
My Viewand clicks on it, then browser waits 10 seconds (since the other page is loading), then after 10 seconds find a text named
test_screenshot: saves the screenshot of the given URL.
def test_get_something(self): self.browser.get(self.url) text = self.browser.find_element_by_link_text('My View') text.click() WebDriverWait(self.browser, 10).until(lambda x: x.find_element_by_id('view')) find_text = self.browser.find_element_by_id('view') self.assertEqual(find_text, None) def test_screenshot(self): self.browser.get(self.url + '/go_where') time.sleep(5) self.browser.save_screenshot('test.jpg') self.browser.close()
It's all about testing the limits of your application.
ab is your friend. I was quite happy when I found out our web.py application was capable of handling 40,000 requests and 165 multiple connections without any caching or anything else.
ab command would be;
ab -c 180 -n 40000 http://192.168.100.124/
Performance measurement of a web application has 2 parts:
- Client Side
- Server Side
Client Side Performance
This could be measured by YSlow, an extension for Firebug. Google has a tool for performance measurement but it's only for 32 bit architecture.
Server Side Performance
As I said
ab is your friend. It does the job.
There are a lot of tools you could use for penetrate testing. For example; XSSer for XSS testing. The main thing we (we means I by the way since I was the only developer) had no choice but to use offline testing tools. Online testing tools are useless for various of reasons;
- 99.99% times your development servers don't have Internet connection and the only way to connect them is via VPS.
- I'd like to run the test on development code not on production code.
- Let's say I found a security vulnerability on production code. I can't fix it immediately. I know, I know, real men do make changes on production code using nano but I'm a coward.
Online testing tools were useless in our case since our application servers didn't have Internet connection and we could only connect them via VPS.
This part is important. You need to select something that won't screw you back or won't corrupt your files. Our choice was git whereas the other team (there were two teams on the project which designed different parts of the project) decided to go with subversion.
Before committing files to repository, the following steps were done:
- Do lint analysis
- Optimize imports
- PEP8 and PyLint analysis
There are a lot of bug tracking software out there but the teams went with the following:
- YouTrack: Issue tracker and bug tracker from JetBrains. It's awesome-o.
- Trac: It's good enough and was the choice of other team.
If you don't use a framework, or if you use a minimalist framework it's your responsibility to secure your application. Even if you use a framework such as Rails or Django, it's still your responsibility to secure your application. I'm not much a security guy but I took care of every vulnerability I know and took necessary steps to store the passwords. No, passwords are not stored in plain text nor are they hashed with
SHA1 without any salt key (I'm looking at you LinkedIn).
- XSS: web.py provides a method to protect against XSS. However, I used markupsafe library.
- SQL-Injection: Since we used SQLAlchemy this wasn't really an issue.
- CSRF: web.py doesn't do any protection against CSRF and as far as I know there's not a 3rd party library so I had to code my own protection. The key point of CSRF is to generate a key that's hard to guess for an attacker so I generated random SHA256 keys for CSRF tokens.
- Cookie Poisoning: How I handled this is easy. You need to send the cookie to user for obvious reasons but the cookie the application sends to user is a SHA256 digest which stores the necessary information. There's an encrypt key, validation key and secret key which are used to generate the cookie and these keys are stored only on server side and the cookie is stored in database too. I don't want to go into too much detail how these different keys are used.
Later on you need to take care of active network attackers and bunch of other things.
Application Specific Security
This part is about form validation and storing the passwords. Since I can't go into too much detail but we had lots of forms which needed their own validation and protection layers. This involved writing strict regex rules and strictly checking every single input.
User passwords are hashed N times where N is different for each user by using a global salt key and a salt key which is different for each user.
Deployment (aka Production vs. Development)
For this reason, I use my own tools. I'm not cool enough to use fabric (no, there's no sarcasm or arrogance when I say it).
- Google Closure Compiler: My personal favorite.
- Dojo Compressor: Never used it. I have no clue.
- JSMin: Never used it. I have no clue.
JS Lint Checking
PyCharm provides lint checking out of the box.
- Yahoo Compressor: Good enough. Does the job.
- Less: LESS provides a minimizer as well, you could also use Yahoo Compressor with it though.
- Icey's Compressor: This is an online tool so there's no way to use it offline. That's the disadvantage I noticed first. Never really used it.
Gzip All The Things!
/usr/bin/gzip -cn9 main.js > small.js.gz
Every person has different point of views and skills and abilities. The most common issue is communication issues.
This part is huge. What I understand is different than what you understand. People have different levels of skills and abilities. If another team member doesn't have the knowledge and technical background as you then the entire team is in trouble.
Another part is the language. English is the second or the third language for some people. Even if the person is the native speaker of English, if he doesn't have the technical background, level of knowledge differences end up with miscommunication problems.
Telephone, SMS, Skype, meetings and email are the tools you could use for communication. You could also "shout at each other" as a communication tool. It works but we didn't use it, well we rarely used it, maybe only once.
The Thing That Should Not Be
Software methodologies. Waterfall, agile, scrum, XP, you name it. I could write something about them but legendary Zed Shaw said it very well.
- Hubris will cause you problems.
- Never say "what can go wrong" and take frequent backup of files and make sure you commit everything. EVERYTHING.
- Make sure you commit and backup everything (I know I said this above). Seeing
node.right can't be null: ReferenceExceptionin the last minute isn't very pleasant.
- C can play with Python quite well.
- Even if it's a web application there's nothing wrong with "putting the pedal to the metal". C is your friend where suits.
- Asynchronous all the things!
- web.py is solid and capable of creating complex web services.