In the world of Python we have quite decent tools for a static code analysis. There are pylint, flake8, pep8 just to name a few. Rules they enforce are based on a solid foundation – PEP8 – Style Guide for Python Code. Beside style & convention related issues, tools for SCA can detect errors like using wrong variables, typos etc. This helps a lot and leaves little room for ambiguity when it comes to coding conventions. However, there are situations when no particular convention is enforced by any standard. As a consequence, there are few possible ways of writing something down. Choosing which one is correct is up to the team that owns particular project. Of course from that moment people are responsible for enforcing new law, which of course is error prone.

Why would anybody write a custom checker?

I found myself in a similar situation few weeks ago after joining new team. These are team’s rules for using import statements:

  • Obey standard grouping of import statements – first standard library, then 3rd party modules and finally imports from the project
  • All import statements go before from … import
  • imports in each group have to be sorted alphabetically
  • stuff imported inside from … import has also be sorted

Correct example:

Incorrect example:

These rules are not something extraordinary, yet it takes some time to get used to them. Even after few months in the project one can make a mistake and misplace imports. I made plenty such errors in the beginning of the work in the project. It must have been irritating for other team members to point out these over and over. So to spare my teammates some nerves and time, I come up with an idea to write a custom checker for pylint.

Pylint is a pluggable piece of software that can be easily extended with such custom rules, yet they have to be programmed. There are two approaches when writing custom checker – treat code as an Abstract Syntax Tree or raw string. For this particular application AST analysis was much more handy, so I went this way.

What are ASTs, actually?

Abstract syntax trees are a way of representing code in an easily parsable way. To grasp a general idea, consider following example:

AST is a structure that contains our code. We see that root element is a Module. In its body it has Import and FunctionDef. If we look further, into FunctionDef element we discover it also has body with Assign and Return.

I use astroid library – it’s a boosted variation of standard library’s ast module. Pylint uses astroid extensively.

That’s the basic idea – we get nested structure of objects which represent single statements in our code. Such exemplary piece of code:

is turned into this:

Of course AST stores additional things such as node position, so it is an exact code representation.

Many pylint’s checkers works by visiting nodes in a given AST. One has two callbacks to implement per each node type, visit_<nodename> and leave_<nodename>. First one is invoked when tree traversal reaches particular node and second when we come back as there is no more left to traverse. Depth First Search algorithm is used for traversing the tree. It makes sense, because we are able to gather all statements in a function and inside leave_functiondef do some stuff, like check the number of assignments or something.

Let’s start with an implementation of first, simpler rule – all names imported in from .. import should be sorted alphabetically. Prepare example file which violates that rule:

Running pylint shows no problems:

First, we need a checker boilerplate code:

Actual implementation is pretty easy:

Now we run pylint with our custom checker and it complains as we expected:

After fixing positions

Pylint checks pass:

Second rule is much more tricky to get it implemented right. By looking into original pylint imports checker module’s code I discovered it can be reused, because it already gathers all Import and ImportFrom statements together and can even organize them into groups of entities that come from standard library, 3rd party or are local imports.

First, code that violates this rule:

Secondly, implementation. All interesting places are commented:

Our checks obviously fails:

Fixing code

makes Pylint happy again:

Conclusion

No software developer should spent time on manually checking things that SCA should take care of. We have much better things to do. 😉 Let me know in the comments if you have ever come across any problem that custom checks of Pylint could solve.

All code presented in this post is GPL-2 licensed (just like Pylint), so enjoy. Don’t forget to read sources of Pylint – the most fundamental source of information in this blog post.