Evgenii Legotckoi
Evgenii LegotckoiAug. 3, 2018, 4:05 a.m.

Profiling memory usage on Linux with Qt Creator 4.7

Performance Analyzer

You may have heard of the Performance Analyzer (named “CPU Usage Analyzer” in Qt Creator 4.6 and earlier). It's about profiling applications using the excellent "perf" tool in Linux. You can use it locally on Linux - the basic desktop system - or on various embedded devices. "perf" can record various events that occur in your application. These include errors (wastes) in the cache, memory loads, context switches, or one of the most common, CPU cycles, which periodically write a stack sample after a number of CPU cycles have passed. The resulting profile shows which functions in your application are taking up the most CPU cycles. This is the most prominent use of the Performance Analyzer, at least so far.


Create trace points

With Qt Creator 4.7 you can also log events for tracepoints, and if your tracepoints meet a certain naming condition, Qt Creator will know that they mean allocating resources or deallocating those resources. Therefore, by setting tracepoints on malloc, free, and friends, you can monitor your applications in many ways. To help you set up tracepoints for this use case, Qt Creator packages a shell script (shell script) that you can execute and asks you to run it. First open your project and select the run configuration you want to explore. Then just select the "Generate Tracepoints..." button in the analyzer's title bar and you'll get:

How it works?

In order for unprivileged users to use tracepoints, the script must make the debugging engine and file system traces available to all users on the system. You should only do this in controlled environments. The script usually works for 32-bit ARM systems and 64-bit x86 systems. 64-bit ARM systems can only accept tracepoints if you are running Linux kernel version 4.10 or higher. To set tracepoints on x86 32-bit systems, you need to have the debug symbols for your standard C library. The script will try to create tracepoints for any binary (binary) code called libc.so.6 found in /lib. If you have a 64-bit system with additional 32-bit libraries installed, it will try to create tracepoints for both substructures. It can only be successful for one of them. This is not a problem if your application is targeting a substructure for which the script was able to set tracepoints.

Troubleshooting

If the point trace script fails, you can check that your kernel was compiled with the CONFIG_UPROBE_EVENT option enabled. Without this option, the kernel does not support user tracepoints. All 32-bit ARM images shipped with Qt for Device Creation have this option since version 5.11. Most Linux distributions designed for desktop use use CONFIG_UPROBE_EVENT by default.

Using tracepoints for profiling

After creating tracepoints, you need to tell Qt Creator to use them for profiling. To do this, there is a convenient keyboard shortcut in the performance analyzer settings. You can access the settings either for your specific project in the "Run" settings in Projects mode, or globally from "Options" in the "Tools" menu. Just select "Use trace points". Qt Creator will then replace your current event setup with any tracepoints it finds on the target system, and be sure to write a pattern each time a tracepoint is encountered.

After that, you need to click on the start button on the profiler toolbar to profile the application. After the application ends, Qt Creator collects the profile data and displays it.

Data interpretation

The easiest way to figure out which pieces of code are using a lot of memory is to look at the flame graph. To get the most meaningful results, select the “Peak Usage” mode in the top right corner. This will show you the given graph sorted by the accumulated amount of memory allocated by those call chains. Consider this example

Findings

As you can see here is a Qt Creator profile loading a large QML trace into QML Profiler. The QML profiler uses a lot of memory when displaying large traces. This profile tells us some usage details. Incidentally, this fiery chart tells us that:

  • Models for Timeline, Statistics and Flame charts consume about 43% of peak memory. TimelineTraceManager::appendEvent(...) sends events to various models and causes selection.
  • Of these, the largest share is 18.9% for Timeline range models. The JavaScript, Bindings, and Signal Handling categories are range models. They store a vector of additional data with a record for each such range. You can see QArrayData::allocate(...) which allocates memory for these vectors.
  • Timeline rendering consumes most of the memory not allocated to base models. In particular, Timeline::NodeUpdater::run() appears in all other stack traces. This function is responsible for filling in the geometry used to display the Timeline categories. So QSGGeometry::allocate(...) is what we see as the direct reason for allocations. It also tells us why QML profiles a graphics card with several gigabytes of memory to display such traces.

Possible optimizations

From here it is easy to suggest ideas for optimizing disruptive functions. We could reconsider whether we really need all the data stored in the various models, or we could temporarily save it to disk until we need it. The overwhelming amount of geometry highlighted here also suggests that the threshold for merging adjacent events into a dense trace may be too low. Finally, we could release the geometry in main memory as soon as we load it on the GPU.

Tracing overhead

Profiling every call to malloc() and free() in an application will result in significant overhead. The core will most likely not be able to keep up with the times and will therefore drop some samples. However, depending on your specific workload, the resulting profile can still give you important information. In other words: if your application allocates a huge amount of memory in only a few calls to malloc(), allocating and releasing a small amount at a high frequency at the same time, you may miss the malloc() calls you are interested in because the kernel may discard them. However, if problematic malloc() calls make up a larger percentage of the total calls, you'll probably catch at least some of them. Either way, Qt Creator will present you with absolute numbers for allocation, release, and maximum memory usage. These numbers are for the perf samples that are actually reported and are therefore not entirely accurate. Other tools will report different numbers.

Special memory allocation functions

Also, there are memory allocation functions that you cannot use to profile in this way. In particular, posix_memalign() does not return the resulting exponent on the stack or in a register. Therefore, we cannot write it with a tracepoint. Also, custom memory allocators that you may use for your application are not handled by the default tracepoints. For example, the JavaScript heap allocator used by QML will not show up in the profile. Though for this specific case you can use QML Profiler. There are also various replacements for the standard C allocation functions, such as jemalloc or tcmalloc. If you want to track them, you need to define custom tracepoints

Conclusion

Memory usage profiling with the Qt Creator Performance Analyzer is a quick and easy way to get important information about your application's memory usage. It works in a box for any Linux targets supported by Qt Creator. You can immediately view the received profile data in a public graphical user interface (GUI) without further processing or transmission of data. Other tools may provide more accurate data. However, for a quick overview of your application's memory usage, Performance Analyzer is often the best tool.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Do you like it? Share on social networks!

Comments

Only authorized users can post comments.
Please, Log in or Sign up
B

C++ - Test 002. Constants

  • Result:16points,
  • Rating points-10
B

C++ - Test 001. The first program and data types

  • Result:46points,
  • Rating points-6
FL

C++ - Test 006. Enumerations

  • Result:80points,
  • Rating points4
Last comments
k
kmssrFeb. 8, 2024, 6:43 p.m.
Qt Linux - Lesson 001. Autorun Qt application under Linux как сделать автозапуск для флэтпака, который не даёт создавать файлы в ~/.config - вот это вопрос ))
Qt WinAPI - Lesson 007. Working with ICMP Ping in Qt Без строки #include <QRegularExpressionValidator> в заголовочном файле не работает валидатор.
EVA
EVADec. 25, 2023, 10:30 a.m.
Boost - static linking in CMake project under Windows Ошибка LNK1104 часто возникает, когда компоновщик не может найти или открыть файл библиотеки. В вашем случае, это файл libboost_locale-vc142-mt-gd-x64-1_74.lib из библиотеки Boost для C+…
J
JonnyJoDec. 25, 2023, 8:38 a.m.
Boost - static linking in CMake project under Windows Сделал всё по-как у вас, но выдаёт ошибку [build] LINK : fatal error LNK1104: не удается открыть файл "libboost_locale-vc142-mt-gd-x64-1_74.lib" Хоть убей, не могу понять в чём дел…
G
GvozdikDec. 18, 2023, 9:01 p.m.
Qt/C++ - Lesson 056. Connecting the Boost library in Qt for MinGW and MSVC compilers Для решения твой проблемы добавь в файл .pro строчку "LIBS += -lws2_32" она решит проблему , лично мне помогло.
Now discuss on the forum
AC
Alexandru CodreanuJan. 19, 2024, 11:57 a.m.
QML Обнулить значения SpinBox Доброго времени суток, не могу разобраться с обнулением значение SpinBox находящего в делегате. import QtQuickimport QtQuick.ControlsWindow { width: 640 height: 480 visible: tr…
BlinCT
BlinCTDec. 27, 2023, 8:57 a.m.
Растягивать Image на парент по высоте Ну и само собою дял включения scrollbar надо чтобы был Flickable. Так что выходит как то так Flickable{ id: root anchors.fill: parent clip: true property url linkFile p…
Дмитрий
ДмитрийJan. 10, 2024, 4:18 a.m.
Qt Creator загружает всю оперативную память Проблема решена. Удалось разобраться с помощью утилиты strace. Запустил ее: strace ./qtcreator Начал выводиться весь лог работы креатора. В один момент он начал считывать фай…
Evgenii Legotckoi
Evgenii LegotckoiDec. 12, 2023, 6:48 a.m.
Побуквенное сравнение двух строк Добрый день. Там случайно не высылается этот сигнал textChanged ещё и при форматировани текста? Если решиать в лоб, то можно просто отключать сигнал/слотовое соединение внутри слота и …

Follow us in social networks