Evgenii Legotckoi
Evgenii LegotckoiAug. 3, 2018, 4:05 a.m.

Profiling memory usage on Linux with Qt Creator 4.7

Performance Analyzer

You may have heard of the Performance Analyzer (named “CPU Usage Analyzer” in Qt Creator 4.6 and earlier). It's about profiling applications using the excellent "perf" tool in Linux. You can use it locally on Linux - the basic desktop system - or on various embedded devices. "perf" can record various events that occur in your application. These include errors (wastes) in the cache, memory loads, context switches, or one of the most common, CPU cycles, which periodically write a stack sample after a number of CPU cycles have passed. The resulting profile shows which functions in your application are taking up the most CPU cycles. This is the most prominent use of the Performance Analyzer, at least so far.


Create trace points

With Qt Creator 4.7 you can also log events for tracepoints, and if your tracepoints meet a certain naming condition, Qt Creator will know that they mean allocating resources or deallocating those resources. Therefore, by setting tracepoints on malloc, free, and friends, you can monitor your applications in many ways. To help you set up tracepoints for this use case, Qt Creator packages a shell script (shell script) that you can execute and asks you to run it. First open your project and select the run configuration you want to explore. Then just select the "Generate Tracepoints..." button in the analyzer's title bar and you'll get:

How it works?

In order for unprivileged users to use tracepoints, the script must make the debugging engine and file system traces available to all users on the system. You should only do this in controlled environments. The script usually works for 32-bit ARM systems and 64-bit x86 systems. 64-bit ARM systems can only accept tracepoints if you are running Linux kernel version 4.10 or higher. To set tracepoints on x86 32-bit systems, you need to have the debug symbols for your standard C library. The script will try to create tracepoints for any binary (binary) code called libc.so.6 found in /lib. If you have a 64-bit system with additional 32-bit libraries installed, it will try to create tracepoints for both substructures. It can only be successful for one of them. This is not a problem if your application is targeting a substructure for which the script was able to set tracepoints.

Troubleshooting

If the point trace script fails, you can check that your kernel was compiled with the CONFIG_UPROBE_EVENT option enabled. Without this option, the kernel does not support user tracepoints. All 32-bit ARM images shipped with Qt for Device Creation have this option since version 5.11. Most Linux distributions designed for desktop use use CONFIG_UPROBE_EVENT by default.

Using tracepoints for profiling

After creating tracepoints, you need to tell Qt Creator to use them for profiling. To do this, there is a convenient keyboard shortcut in the performance analyzer settings. You can access the settings either for your specific project in the "Run" settings in Projects mode, or globally from "Options" in the "Tools" menu. Just select "Use trace points". Qt Creator will then replace your current event setup with any tracepoints it finds on the target system, and be sure to write a pattern each time a tracepoint is encountered.

After that, you need to click on the start button on the profiler toolbar to profile the application. After the application ends, Qt Creator collects the profile data and displays it.

Data interpretation

The easiest way to figure out which pieces of code are using a lot of memory is to look at the flame graph. To get the most meaningful results, select the “Peak Usage” mode in the top right corner. This will show you the given graph sorted by the accumulated amount of memory allocated by those call chains. Consider this example

Findings

As you can see here is a Qt Creator profile loading a large QML trace into QML Profiler. The QML profiler uses a lot of memory when displaying large traces. This profile tells us some usage details. Incidentally, this fiery chart tells us that:

  • Models for Timeline, Statistics and Flame charts consume about 43% of peak memory. TimelineTraceManager::appendEvent(...) sends events to various models and causes selection.
  • Of these, the largest share is 18.9% for Timeline range models. The JavaScript, Bindings, and Signal Handling categories are range models. They store a vector of additional data with a record for each such range. You can see QArrayData::allocate(...) which allocates memory for these vectors.
  • Timeline rendering consumes most of the memory not allocated to base models. In particular, Timeline::NodeUpdater::run() appears in all other stack traces. This function is responsible for filling in the geometry used to display the Timeline categories. So QSGGeometry::allocate(...) is what we see as the direct reason for allocations. It also tells us why QML profiles a graphics card with several gigabytes of memory to display such traces.

Possible optimizations

From here it is easy to suggest ideas for optimizing disruptive functions. We could reconsider whether we really need all the data stored in the various models, or we could temporarily save it to disk until we need it. The overwhelming amount of geometry highlighted here also suggests that the threshold for merging adjacent events into a dense trace may be too low. Finally, we could release the geometry in main memory as soon as we load it on the GPU.

Tracing overhead

Profiling every call to malloc() and free() in an application will result in significant overhead. The core will most likely not be able to keep up with the times and will therefore drop some samples. However, depending on your specific workload, the resulting profile can still give you important information. In other words: if your application allocates a huge amount of memory in only a few calls to malloc(), allocating and releasing a small amount at a high frequency at the same time, you may miss the malloc() calls you are interested in because the kernel may discard them. However, if problematic malloc() calls make up a larger percentage of the total calls, you'll probably catch at least some of them. Either way, Qt Creator will present you with absolute numbers for allocation, release, and maximum memory usage. These numbers are for the perf samples that are actually reported and are therefore not entirely accurate. Other tools will report different numbers.

Special memory allocation functions

Also, there are memory allocation functions that you cannot use to profile in this way. In particular, posix_memalign() does not return the resulting exponent on the stack or in a register. Therefore, we cannot write it with a tracepoint. Also, custom memory allocators that you may use for your application are not handled by the default tracepoints. For example, the JavaScript heap allocator used by QML will not show up in the profile. Though for this specific case you can use QML Profiler. There are also various replacements for the standard C allocation functions, such as jemalloc or tcmalloc. If you want to track them, you need to define custom tracepoints

Conclusion

Memory usage profiling with the Qt Creator Performance Analyzer is a quick and easy way to get important information about your application's memory usage. It works in a box for any Linux targets supported by Qt Creator. You can immediately view the received profile data in a public graphical user interface (GUI) without further processing or transmission of data. Other tools may provide more accurate data. However, for a quick overview of your application's memory usage, Performance Analyzer is often the best tool.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Do you like it? Share on social networks!

Comments

Only authorized users can post comments.
Please, Log in or Sign up
AD

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:50points,
  • Rating points-4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:80points,
  • Rating points4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:20points,
  • Rating points-10
Last comments
ИМ
Игорь МаксимовNov. 22, 2024, 11:51 a.m.
Django - Tutorial 017. Customize the login page to Django Добрый вечер Евгений! Я сделал себе авторизацию аналогичную вашей, все работает, кроме возврата к предидущей странице. Редеректит всегда на главную, хотя в логах сервера вижу запросы на правильн…
Evgenii Legotckoi
Evgenii LegotckoiOct. 31, 2024, 2:37 p.m.
Django - Lesson 064. How to write a Python Markdown extension Добрый день. Да, можно. Либо через такие же плагины, либо с постобработкой через python библиотеку Beautiful Soup
A
ALO1ZEOct. 19, 2024, 8:19 a.m.
Fb3 file reader on Qt Creator Подскажите как это запустить? Я не шарю в программировании и кодинге. Скачал и установаил Qt, но куча ошибок выдается и не запустить. А очень надо fb3 переконвертировать в html
ИМ
Игорь МаксимовOct. 5, 2024, 7:51 a.m.
Django - Lesson 064. How to write a Python Markdown extension Приветствую Евгений! У меня вопрос. Можно ли вставлять свои классы в разметку редактора markdown? Допустим имея стандартную разметку: <ul> <li></li> <li></l…
d
dblas5July 5, 2024, 11:02 a.m.
QML - Lesson 016. SQLite database and the working with it in QML Qt Здравствуйте, возникает такая проблема (я новичок): ApplicationWindow неизвестный элемент. (М300) для TextField и Button аналогично. Могу предположить, что из-за более новой верси…
Now discuss on the forum
Evgenii Legotckoi
Evgenii LegotckoiJune 24, 2024, 3:11 p.m.
добавить qlineseries в функции Я тут. Работы оень много. Отправил его в бан.
t
tonypeachey1Nov. 15, 2024, 6:04 a.m.
google domain [url=https://google.com/]domain[/url] domain [http://www.example.com link title]
NSProject
NSProjectJune 4, 2022, 3:49 a.m.
Всё ещё разбираюсь с кешем. В следствии прочтения данной статьи. Я принял для себя решение сделать кеширование свойств менеджера модели LikeDislike. И так как установка evileg_core для меня не была возможна, ибо он писался…
9
9AnonimOct. 25, 2024, 9:10 a.m.
Машина тьюринга // Начальное состояние 0 0, ,<,1 // Переход в состояние 1 при пустом символе 0,0,>,0 // Остаемся в состоянии 0, двигаясь вправо при встрече 0 0,1,>…

Follow us in social networks