PROFILING FOR PERFORMANCE
You don't care about performance, before you rant at me and include my mother in the conversation, answer some of these following questions:
Do you test for performance?
If you do, do you do it in a automatic and repeatable manner?
Do you track, analyze and categorize the results?
If you answered yes to most of these questions, you could stick around for part two in which I try to play with some useful profiling tools like locust.io, django-silk or maybe you are curios to see the special app that I build especially for this demonstration. For the rest of you, read on.
Profiling is a complex topic, so my take on this is to split it in two parts:
Part one will be a superficial quick and general introduction to profiling python, no prior knowledge required1, I just want to make sure that we are on the same page.
Part two is where we get our hands dirty and simulate a workflow of profiling an application. This part will require from you knowledge on some advance topics, but fear not, still accessible to most of the readers out there.
Profiling for performance2
I've used the term profiling several times before but never explained anything about it.
According to Wikipedia:
Profiling (computer programming) is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of functions calls. Most commonly, profiling information servers to aid program optimization.
With other words: profiling allows you to see where you are spending your time, it allows you to optimize your code in a intelligent fashion, runs almost exactly the same code as in production. You can easy3 track entry/exit times for functions and helps you build the bigger picture about the application.
How much of resource X4 is used?
How exactly this amount of X is spent?
Looking for bottlenecks
Why do we profile python?
Because (and it hurts my heart to say this) cPython is slow. Python is a beautiful language, but the rumor spread around by jealous Java programmers is sadly true.
So why not make Python faster?
Well, that is exactly what PyPy is trying to do.
PyPy is an automated, runtime, profile-driven optimization a.k.a.. JIT compiler, because PyPy tries to tackle a non-trivial problem we are not quite there yet.
Even so, at the moment, computers still aren't powerful enough to restructure your entire program to use a better suited algorithm for the job.
So until then we profile.
Why not optimize everything?
Fast code is expensive
Well, this is exactly what the wizards of yesterday did, they wrote code in which every CPU cycle, every piece of memory mattered, control everything was the name of the game. They had no other way.
But this level of care comes at a cost, fast code is expensive because it takes effort to write (good) fast code. It needs better algorithms, deep research into approaches, diligent and focused effort.
Fast code is hard to maintain
Smart code introduces caches, lazy loading, parallelization, assumptions and requirements.
Much smarter code gets you to "meta-programming obscure", code where nothing you write is the code that runs.
Sure i's fast, but it is extremely slow to write, and/or slow to maintain.
We need fast but maintainable code, so we give up right?
No, we do "Intelligent optimization".
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. -Knuth
The trick here is to mess up only a small portion of the code base, while preserving the elegance of the whole.
Maintain nice flexibility and uglify just the important 3%.
How to find the 3%?
Why not guess?
I really understand my code!
Well probably you do, and probably most of the time you do a good job but you are taking a bet every time. Better to look first.
What options do I have?
Before thinking at the tools, let's try to split them according to some general rules of thumb.
Overview of the available groups
What is being profiled?
I/O or network
How is it iprofiled?
Deterministic (one instruction at a time, precise information, slow generates huge amount of data)
Statistical (looks at the stack from time to time, not a precise, fast generates manageable amount of data)
How is it reported?
Run time, aggregate ("snippet-level") : timeit
Run time, method-level, deterministic : cProfile
Runtime, method-level, statistical : statprof.py
Runtime, line-level, deterministic : line_profiler
Memory, line-level, deterministic : memory_profiler
Memory, method-level, deterministic : pympler (Py2/3) guppy/heapy (Py2)
You can find other interesting articles on similar topics here.
Profile for performance I stole a lot of information from this wonderful presentation
Advance Python Profiling Excellent presentation with tons of code examples
 well that no quite true, you still need to know about concepts as: functions, modules, stack
 here we cheat and say that the system is performant if it satisfies some conditions, usually given in the code specifications.
 well usually not that easy but you get the point
 X can be CPU time, RAM, I/O, power