Evgenii Legotckoi
Evgenii LegotckoiAug. 3, 2018, 4:05 a.m.

Profiling memory usage on Linux with Qt Creator 4.7

Performance Analyzer

You may have heard of the Performance Analyzer (named “CPU Usage Analyzer” in Qt Creator 4.6 and earlier). It's about profiling applications using the excellent "perf" tool in Linux. You can use it locally on Linux - the basic desktop system - or on various embedded devices. "perf" can record various events that occur in your application. These include errors (wastes) in the cache, memory loads, context switches, or one of the most common, CPU cycles, which periodically write a stack sample after a number of CPU cycles have passed. The resulting profile shows which functions in your application are taking up the most CPU cycles. This is the most prominent use of the Performance Analyzer, at least so far.


Create trace points

With Qt Creator 4.7 you can also log events for tracepoints, and if your tracepoints meet a certain naming condition, Qt Creator will know that they mean allocating resources or deallocating those resources. Therefore, by setting tracepoints on malloc, free, and friends, you can monitor your applications in many ways. To help you set up tracepoints for this use case, Qt Creator packages a shell script (shell script) that you can execute and asks you to run it. First open your project and select the run configuration you want to explore. Then just select the "Generate Tracepoints..." button in the analyzer's title bar and you'll get:

How it works?

In order for unprivileged users to use tracepoints, the script must make the debugging engine and file system traces available to all users on the system. You should only do this in controlled environments. The script usually works for 32-bit ARM systems and 64-bit x86 systems. 64-bit ARM systems can only accept tracepoints if you are running Linux kernel version 4.10 or higher. To set tracepoints on x86 32-bit systems, you need to have the debug symbols for your standard C library. The script will try to create tracepoints for any binary (binary) code called libc.so.6 found in /lib. If you have a 64-bit system with additional 32-bit libraries installed, it will try to create tracepoints for both substructures. It can only be successful for one of them. This is not a problem if your application is targeting a substructure for which the script was able to set tracepoints.

Troubleshooting

If the point trace script fails, you can check that your kernel was compiled with the CONFIG_UPROBE_EVENT option enabled. Without this option, the kernel does not support user tracepoints. All 32-bit ARM images shipped with Qt for Device Creation have this option since version 5.11. Most Linux distributions designed for desktop use use CONFIG_UPROBE_EVENT by default.

Using tracepoints for profiling

After creating tracepoints, you need to tell Qt Creator to use them for profiling. To do this, there is a convenient keyboard shortcut in the performance analyzer settings. You can access the settings either for your specific project in the "Run" settings in Projects mode, or globally from "Options" in the "Tools" menu. Just select "Use trace points". Qt Creator will then replace your current event setup with any tracepoints it finds on the target system, and be sure to write a pattern each time a tracepoint is encountered.

After that, you need to click on the start button on the profiler toolbar to profile the application. After the application ends, Qt Creator collects the profile data and displays it.

Data interpretation

The easiest way to figure out which pieces of code are using a lot of memory is to look at the flame graph. To get the most meaningful results, select the “Peak Usage” mode in the top right corner. This will show you the given graph sorted by the accumulated amount of memory allocated by those call chains. Consider this example

Findings

As you can see here is a Qt Creator profile loading a large QML trace into QML Profiler. The QML profiler uses a lot of memory when displaying large traces. This profile tells us some usage details. Incidentally, this fiery chart tells us that:

  • Models for Timeline, Statistics and Flame charts consume about 43% of peak memory. TimelineTraceManager::appendEvent(...) sends events to various models and causes selection.
  • Of these, the largest share is 18.9% for Timeline range models. The JavaScript, Bindings, and Signal Handling categories are range models. They store a vector of additional data with a record for each such range. You can see QArrayData::allocate(...) which allocates memory for these vectors.
  • Timeline rendering consumes most of the memory not allocated to base models. In particular, Timeline::NodeUpdater::run() appears in all other stack traces. This function is responsible for filling in the geometry used to display the Timeline categories. So QSGGeometry::allocate(...) is what we see as the direct reason for allocations. It also tells us why QML profiles a graphics card with several gigabytes of memory to display such traces.

Possible optimizations

From here it is easy to suggest ideas for optimizing disruptive functions. We could reconsider whether we really need all the data stored in the various models, or we could temporarily save it to disk until we need it. The overwhelming amount of geometry highlighted here also suggests that the threshold for merging adjacent events into a dense trace may be too low. Finally, we could release the geometry in main memory as soon as we load it on the GPU.

Tracing overhead

Profiling every call to malloc() and free() in an application will result in significant overhead. The core will most likely not be able to keep up with the times and will therefore drop some samples. However, depending on your specific workload, the resulting profile can still give you important information. In other words: if your application allocates a huge amount of memory in only a few calls to malloc(), allocating and releasing a small amount at a high frequency at the same time, you may miss the malloc() calls you are interested in because the kernel may discard them. However, if problematic malloc() calls make up a larger percentage of the total calls, you'll probably catch at least some of them. Either way, Qt Creator will present you with absolute numbers for allocation, release, and maximum memory usage. These numbers are for the perf samples that are actually reported and are therefore not entirely accurate. Other tools will report different numbers.

Special memory allocation functions

Also, there are memory allocation functions that you cannot use to profile in this way. In particular, posix_memalign() does not return the resulting exponent on the stack or in a register. Therefore, we cannot write it with a tracepoint. Also, custom memory allocators that you may use for your application are not handled by the default tracepoints. For example, the JavaScript heap allocator used by QML will not show up in the profile. Though for this specific case you can use QML Profiler. There are also various replacements for the standard C allocation functions, such as jemalloc or tcmalloc. If you want to track them, you need to define custom tracepoints

Conclusion

Memory usage profiling with the Qt Creator Performance Analyzer is a quick and easy way to get important information about your application's memory usage. It works in a box for any Linux targets supported by Qt Creator. You can immediately view the received profile data in a public graphical user interface (GUI) without further processing or transmission of data. Other tools may provide more accurate data. However, for a quick overview of your application's memory usage, Performance Analyzer is often the best tool.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Do you like it? Share on social networks!

i
  • Nov. 8, 2024, 1:04 a.m.

generic priligy online What have you got there

Comments

Only authorized users can post comments.
Please, Log in or Sign up
AD

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:50points,
  • Rating points-4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:80points,
  • Rating points4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:20points,
  • Rating points-10
Last comments
i
innorwallNov. 14, 2024, 7:03 p.m.
Qt/C++ - Lesson 060. Configuring the appearance of the application in runtime I didnt have an issue work colors priligy dapoxetine 60mg revia cost uk August 3, 2022 Reply
i
innorwallNov. 14, 2024, 12:07 p.m.
Circuit switching and packet data transmission networks Angioedema 1 priligy dapoxetine
i
innorwallNov. 14, 2024, 11:42 a.m.
How to Copy Files in Linux If only females relatives with DZ offspring were considered these percentages were 23 order priligy online uk
i
innorwallNov. 14, 2024, 9:09 a.m.
Qt/C++ - Tutorial 068. Hello World using the CMAKE build system in CLion ditropan pristiq dosing With the Yankees leading, 4 3, Rivera jogged in from the bullpen to a standing ovation as he prepared for his final appearance in Chicago buy priligy pakistan
i
innorwallNov. 14, 2024, 4:05 a.m.
EVILEG-CORE. Using Google reCAPTCHA 2001; 98 29 34 priligy buy
Now discuss on the forum
i
innorwallNov. 14, 2024, 3:39 a.m.
добавить qlineseries в функции priligy amazon canada 93 GREB1 protein GREB1 AB011147 6
i
innorwallNov. 11, 2024, 10:55 a.m.
Всё ещё разбираюсь с кешем. priligy walgreens levitra dulcolax carbs The third ring was found to be made up of ultra relativistic electrons, which are also present in both the outer and inner rings
9
9AnonimOct. 25, 2024, 9:10 a.m.
Машина тьюринга // Начальное состояние 0 0, ,<,1 // Переход в состояние 1 при пустом символе 0,0,>,0 // Остаемся в состоянии 0, двигаясь вправо при встрече 0 0,1,>…

Follow us in social networks