Bamyx Technologies
7 min readAug 9, 2021

--

6 FANTASTIC PYTHON CODE PROFILING LIBRARIES
With these profiling utilities, you may get a lot of information on where your Python app is slow.

Image credit: Google

There are two types of speed in programming languages: development speed and execution speed. Python has always prioritized writing speed over running speed. Even if Python code is generally always quick enough for the job, it isn’t always the case. In certain circumstances, you’ll need to figure out where and why it’s lagging and take action.

“Measure, don’t guess,” is a well-known saying in software development and engineering in general. It’s easy to assume what’s wrong with software, but it’s never a smart idea. When it comes to making apps faster, statistics about real program performance are always the best initial tool.

The good news is that Python comes with a number of tools that can help you profile your apps and figure out where they’re slowest. Simple one-liners bundled with the standard library to sophisticated frameworks for getting stats from running programs are among the options available. I’ll go over five of the most important ones here, all of which are cross-platform and can be found in PyPI or in Python’s standard library.

TIME AND TIMEIT
A stopwatch is sometimes all you need. A stopwatch will suffice if all you’re doing is profiling the time between two bits of code that take seconds or minutes to run.

Two stopwatch routines are included in the Python standard library. The perf counter function in the Time module uses the operating system’s high-resolution timer to generate an arbitrary timestamp. Call time.perf counter twice, once before and once after an action, to get the difference. This provides an inconspicuous, low-overhead — if not sophisticated — method of timing.

The Timeit module tries to simulate real-world benchmarking on Python code. The timeit.timeit function takes a snippet of code and executes it multiple times (the default is 1 million passes) to calculate the total time required.

It’s best for determining how a single operation or function call performs in a tight loop — for example, if you want to see if a list comprehension or a traditional list building is faster for something that will be repeated many times. (List comprehensions usually come out on top.)

Time has the disadvantage of being nothing more than a stopwatch, while Timeit’s core use case is microbenchmarks on specific lines or blocks of code. These modules are only useful if you’re working with isolated code. Neither is sufficient for whole-program analysis, which involves determining where your program spends the majority of its time among thousands of lines of code.

cProfile

cProfile, a whole-program analysis profiler, is included in the Python standard library. When you run cProfile, it tracks every function call in your program and generates a list of which ones were used the most and for how long.

cProfile features three major advantages. For starters, it’s part of the standard library, thus it’s available even in a basic Python installation. Two, it examines a variety of statistics about call behavior, such as the time spent in a function call’s own instructions versus the time spent in all other calls invoked by the function. This allows you to see if a function is slow itself or pointing to other functions that are slow.

Three, and probably most importantly, you can freely constrain cProfile. You can sample the entire program’s execution or turn profiling on only when a specific function is called, allowing you to concentrate on what that function is doing and who it is calling. This method works best once you’ve limited things down a little bit, but it saves you the hassle of having to sift through the noise of a full profile trace.

That leads us to the first of cProfile’s flaws: By default, it creates a large number of statistics. It can be difficult to identify the right needle among all the hay. The execution model of cProfile is also a disadvantage: Every function call is trapped, resulting in a large amount of overhead. As a result, cProfile is inappropriate for profiling apps in production with live data, but it is ideal for benchmarking apps in development.

Palanteer
Palanteer, a relatively new addition to the Python profiling arsenal, can profile Python and C++ programs. This makes it extremely handy if you’re creating a Python application that wraps your own C++ libraries and want the most detailed information on both parts of your app. Palanteer’s results are displayed via a graphical user interface (GUI) software that runs on the desktop and is constantly updated as the program is being run.

Running a Python application using Palanteer, similar to how cProfile is used, is all it takes to instrument it. It keeps track of function calls, exceptions, trash collection, and OS-level memory allocations. These final two are especially important if memory use or object allocations are the source of your app’s performance concerns.
Palanteer has a significant disadvantage, at least for the time being, in that it must be built entirely from source code. There are currently no precompiled binaries available as installable Python wheels, so get out your C++ compiler and a copy of CPython’s source code.

Pyinstrument
Pyinstrument, like cProfile, tracks your program and generates reports about the code that takes up the majority of its time. However, Pyinstrument has two significant advantages over cProfile that make it worthwhile to investigate.

Pyinstrument, for starters, does not attempt to hook every single function call. It samples the call stack of your program every millisecond, making it less intrusive while yet being sensitive enough to determine what’s eating up the majority of your application’s runtime.

Pyinstrument’s reporting, on the other hand, is significantly more concise. It displays the top functions in your software that consume the most time, allowing you to concentrate on the most serious issues. It also allows you to find such results quickly and without much fanfare.

Many of the features of cProfile are also available in Pyinstrument. Instead of recording the behavior of the entire program, you can use the profiler as an object in your application and record the behavior of chosen functions. The output can be shown in a variety of formats, including HTML. You can also request to see the entire call history.

Two drawbacks come to mind as well. For starters, some programs that utilize C-compiled extensions, such as those written in Cython, may not work properly when run from the command line with Pyinstrument. However, if Pyinstrument is used in the program itself — for example, by wrapping a main() function in a Pyinstrument profiler call — they operate.

The second caveat is that Pyinstrument does not work well with multi-threaded programming. Py-spy, which is described further down, may be a better option in this case.

PY-SPY
Instead of trying to record every single call, Py-spy, like Pyinstrument, samples the status of a program’s call stack at regular intervals. Py-spy, unlike PyInstrument, contains fundamental components written in Rust (whereas PyInstrument needs a C extension) and runs out-of-process with the profiled program, allowing it to be used safely with production code.

Py-architecture spy’s makes it simple to profile multithreaded or sub-processed Python applications, something many other profilers can’t. Py-spy can profile C extensions as well, but they must be compiled with symbols in order to be useful. In the case of Cython-compiled extensions, the produced C file must be present in order to collect sufficient trace information.

With Py-spy, there are two fundamental techniques to inspect an app. The app can be run with Py-record spy’s command, which results in a flame graph at the end of the run. Alternatively, you can use Py-top spy’s command to present a live-updated, interactive view of your Python app’s internals, similar to the Unix top function. Individual thread stacks can be dumped out from the command line.

Py-spy has one major flaw: it’s designed to profile a complete program, or at least part of its components, from the outside. It does not allow you to decorate and test a single function.

YAPPI
Yappi (“Yet Another Python Profiler”) offers many of the best features of the other profilers mentioned here, as well as a couple that none of them have. Yappi is installed by default as PyCharm’s profiler of choice, so Yappi is already available to PyCharm users.

To utilize Yappi, you add code to your code that instructs the profiling methods to call, start, stop, and generate reports. For measuring the time taken, Yappi gives you the option of using “wall time” or “CPU time.” The former is merely a stopwatch, while the latter uses system-native APIs to determine how long the CPU was really engaged in executing code, ignoring I/O or thread sleeping pauses.

CPU time provides the most accurate estimate of how long specific operations, such as numerical code execution, take.

Yappi’s method of collecting metrics from threads has the advantage of not requiring you to adorn the threaded code. yappi.get thread stats() is a Yappi function that retrieves statistics from any thread activity you record, which you may then parse independently. Similar to cProfile, statistics can be filtered and sorted at a fine level of granularity.
Finally, Yappi can profile greenlets and coroutines, which is something that many other profilers can’t handle well, if at all. The ability to profile concurrent code is an important tool to have, especially given Python’s expanding use of async metaphors

--

--

Bamyx Technologies

We are a technology business that provides large-scale saas solutions to companies in a variety of industries.