mafulechka
mafulechkaNov. 20, 2019, 4:05 a.m.

Efficient QString Concatenation with Folding C ++ 17 Template Parameters

It's common in C++ to have operator+to perform string concatenation, whether you're using the standard library (or STL) or Qt. This allows you to write things like the following snippet:

QString statement{"I'm not"};
QString number{"a number"};
QString space{" "};
QString period{". "};
QString result = statement + space + number + period;

But this has a big disadvantage - unnecessary creation of temporary intermediate results. Namely, in the previous example, there is one temporary string to store the result of the + operator and the empty part of the expression, then this string is concatenated with a number that returns another temporary string. Then the second temporary line is connected to the point, which gives the final result, after which the temporary ones are destroyed.

This means that there are almost as many unnecessary allocations and detachments as there are calls to operator+. In addition, the same data is copied multiple times. For example, the contents of a statement string are first copied into the first temporary object, then copied from the first temporary object to the second, and then from the second temporary object to the final result.


Temporary variables

This can be done in a much more efficient way by creating a string instance that pre-allocates the memory needed to store the final result, and then calls the QString::appendmember function in sequence to add each of the strings we want to concatenate one by one:

QString result;
result.reserve(statement.length() + number.length() + space.length() + period.length();
result.append(statement);
result.append(number);
result.append(space);
result.append(period);

Temporary variables

Alternatively, one could use QString::resizeinstead of QString::reserve and then use std::copy(or std::memcpy) to copy the data into it (later we will see how to use std::copy to string concatenation). This will probably improve performance a bit (depends on compiler optimizations) because QString::append has to check if the capacity of the string is large enough to hold the resulting string. std::copyalgorithm doesn't have that unnecessary extra check that might give it a slight advantage.

Both of these approaches are significantly more efficient than using operator+ , but it would be annoying to write such code every time you want to concatenate multiple lines.

Algorithm std::accumulate

Before continuing on how Qt solves this problem now, and a possible way to improve it in Qt 6 with the great features that C++17 got, let's look at one of the most important and powerful algorithms from the standard library, the std algorithm: :accumulate .

Imagine given a sequence (eg aQVector) of strings that we want to concatenate instead of having them in separate variables.

With std::accumulate, string concatenation would look like this:

QVector<QString> strings{ . . . };
std::accumulate(strings.cbegin(), strings.cend(), QString{});

The algorithm does what you would expect in this example - it starts with an empty QString and adds each string from the vector to it, thus creating a concatenated string.

This would be just as inefficient as the original example of using operator+ for concatenation, since std::accumulate uses operator+ internally by default.

To optimize this implementation, as in the previous section, you can simply use std::accumulate to calculate the size of the resulting string instead of doing a full concatenation with it:

QVector<QString> strings{ . . . };
QString result;
result.resize(
    std::accumulate(strings.cbegin(), strings.cend(),
                    0, [] (int acc, const QString& s) {
                        return s.length();
                    }));

This time, std::accumulate starts at an initial value of 0 and for each string in the vector of strings, it adds the length to that initial value and finally returns the sum of the lengths of all strings in the vector.

This is what std::accumulate means to most people - an accumulation of some kind . But this is a rather simplistic view.

In the first example, indeed all strings in the vector were summed up . But the second example is a little different. In fact, the elements of a vector are not summed. The vector contains QStrings and integers are added.

The power of std::accumulate is that you can pass a custom operation to it. The operation takes a previously accumulated value and one element from the original collection, and generates a new accumulated value. The first time std::accumulate calls this operation, it will pass it the initial value as the accumulator and the first element of the source collection. It will take the result and pass it to the next invocation of the operation, along with the second element of the original collection. This will be repeated until the entire original collection has been processed and the algorithm returns the result of the last call to the operation.

As you can see from the previous code snippet, the accumulator does not have to be the same type as the values in the vector. There was a vector of strings, and the accumulator was an integer.

The above std::copy algorithm takes a sequence of elements to be copied (as an input iterator pair) and a destination (as an output iterator) where the elements are to be copied. It returns an iterator pointing to the element after the last copied element into the destination collection.

This means that if we copy the data of one source row to the destination row using std::copy , we will get an iterator pointing to the exact location where we want to copy the data of the second row.

So there is a function that takes a string (as a pair of iterators) and one output iterator and subsequently gives a new output iterator. This is similar to what can be used as an operation on std::accumulate to implement efficient string concatenation:

QVector<QString> strings{ . . . };
QString result;
result.resize( . . . );

std::accumulate(strings.cbegin(), strings.cend(), result.begin(),
                [] (const auto& dest, const QString& s) {
                    return std::copy(s.cbegin(), s.cend(), dest);
                });

The first call to std::copy will copy the first line to the destination specified in result.begin(). It will return an iterator to the result string just after the last copied character, and that is where the second string from the vector will be copied. After that, the third line will be copied and so on.


Temporary variables

At the end we get the concatenated string.

Recursive expression templates

Now we can return to efficient string concatenation using operator+ in Qt.

QString result = statement + space + number + period;

You can see that the problem with string concatenation stems from the fact that C++ evaluates the previous expression in stages, calling operator+ multiple times, with each call returning a new QString instance.

While it is not possible to change the way this is evaluated in C++, a technique called expression templates can be used to delay the actual evaluation of the resulting string until the entire expression has been defined. This can be done by changing the return type of operator+ not to QString, but to some custom type that simply stores the strings to be concatenated without actually doing the concatenation.

In fact, this is exactly what Qt does with 4.6 if you enable fast string concatenation. Instead of returning a QString, operator+ will return an instance of a hidden class template called QStringBuilder. The QStringBuilderclass template is just a dummy type that contains references to the arguments passed to operator+.

A more complex version of the following:

template <typename Left, typename Right>
class QStringBuilder {
    const Left& _left;
    const Right& _right;
};

When you concatenate multiple strings, you end up with a more complex type where multiple QStringBuilders are nested inside each other. Something like that:

QStringBuilder<QString, QStringBuilder<QString, QStringBuilder<QString, QString>>>

This type is just a complicated way of saying "I'm holding four lines that need to be concatenated".

When requested to convert a QStringBuilder to a QString (for example, by assigning it to the result of a QString), it first calculates the total size of all contained strings, then allocates a QStringinstance of that size, and finally copies the strings one by one into the resulting string.

In essence, it will do the same thing that was done before, but it will be done automatically, without the need to "raise a finger".

Templates with a variable number of arguments

The problem with the current implementation of QStringBuilder is that it implements a container with an arbitrary number of strings via nesting. Each QStringBuilder instance can contain exactly two elements, whether they are strings or other QStringBuilder instances.

This means that all QStringBuilder instances are sort of a binary tree with QStrings as leaf nodes. Whenever it needs to do something with the contained strings, QStringBuilder must process its left subtree and then its right subtree recursively.

Instead of creating binary trees, you can use variadic templates (available since C++11, which was not available when QStringBuilder was created). Variable-argument templates allow you to create classes and functions with an arbitrary number of template arguments.

This means that with std::tuplewe you can create a QStringBuilder class template that contains as many strings as you like:

template <typename... Strings>
class QStringBuilder {
    std::tuple<Strings...> _strings;
};

When we have a new string to add to the QStringBuilder, we can simply append it to the tuple using std::tuple_cat which concatenates the two tuples (we'll use operator% instead of operator+ since this operator is also supported by QStringand QStringBuilder ):

template <typename... Strings>
class QStringBuilder {
    std::tuple<Strings...> _strings;

    template <typename String>
    auto operator%(String&& newString) &&
    {
        return QStringBuilder<Strings..., String>(
            std::tuple_cat(_strings, std::make_tuple(newString)));
    }
};

Contracting template parameters

That's all well and good, but the question is how the fold expression (the Strings part) handles packets.

C++17 introduced a new construct for handling parameter packets called template parameter folding.

The general form of template parameter folding is as follows (operator+ can be replaced with another binary operator such as *, %...):

(init + ... + pack)

or:

(pack + ... + init)

The first option is called left fold expression of the template parameters (left fold expression) , handles the operation as left-associative (left associative), and the second option is called right fold of the template parameters (right fold expression) because it handles the operation as right -associative (associative on the right).

If one wanted to concatenate strings in a template parameter package using template parameter folding, one could do it like this:

template <typename... Strings>
auto concatenate(Strings... strings)
{
    return (QString{} + ... + strings);
}

First, operator+ will be called on the initial value of QString{} and the first element of the parameter pack. It will then call operator+ on the result of the previous call and the second element of the parameter pack. And so on until all the elements have been processed.

Sounds familiar, doesn't it?

The same behavior has been seen with std::accumulate. The only difference is that the std::accumulate algorithm operates on sequences of data at run time (vectors, arrays, lists, etc.). Whereas template parameter folds work with compile-time sequences, i.e. packages of template parameters with a variable number of arguments.

You can follow the same steps to optimize the previous concatenation implementation that was used for std::accumulate. First you need to calculate the sum of all string lengths. It's pretty simple with template parameter folds:

template <typename... Strings>
auto concatenate(Strings... strings)
{
    const auto totalSize = (0 + ... + strings.length());
    . . .
}

When a template parameter fold expands the parameter package, it will receive the following expression:

0 + string1.length() + string2.length() + string3.length()

So, we got the size of the resulting string. Now we can move on to extracting a string large enough for the result and adding the original strings to it one by one.

As mentioned earlier, template parameter folding works with C++ binary operators. If we want to execute a function on every element in a parameter pack, we can use one of the weirdest operators in C and C++, the comma operator.

template <typename... Strings>
auto concatenate(Strings... strings)
{
    const auto totalSize = (0 + ... + strings.length());
    QString result;
    result.reserve(totalSize);

    (result.append(strings), ...);

    return result;
}

This will cause an append for each of the lines in the parameter pack, resulting in a concatenated line.

Custom operators with template parameter folds

The second approach, which was used with std::accumulate, was a bit more complex. It was necessary to provide a custom operation for accumulation. The accumulator (accumulator) was an iterator in the target collection that indicated where to copy the next row.

If we want to have a custom operation with template parameter folds, we need to create a binary operator. An operator like the lambda that was passed to std::accumulate needs to take one output iterator and one string, it needs to call std::copy to copy the string data into that iterator, and it needs to return a new iterator specifying to the element after the last copied character.

To do this, you can redefine the << operator:

template <typename Dest, typename String>
auto operator<< (Dest dest, const String& string)
{
    return std::copy(string.cbegin(), string.cend(), dest);
}

With this implemented template parameter folding that copies all lines to the destination buffer, this becomes fairly easy. The start value is the start iterator of the target buffer, and << each of the strings in the parameter pack to it:

template <typename... Strings>
auto concatenate(Strings... strings)
{
    const auto totalSize = (0 + ... + strings.length());
    QString result;
    result.resize(totalSize);

    (result.begin() << ... << strings);

    return result;
}

folding template parameters and tuples

Now we know how to effectively concatenate a collection of strings - be it a vector or a variadic set of parameters.

The problem is that QStringBuilder doesn't have that either. It stores strings inside a std::tuple, which is neither an iterable collection nor a parameter pack.

To work with template parameter rollup, you need parameter packages. Instead of a parameter pack containing strings, you can create a pack containing a list of indices from 0 to n-1, which can later be used with std::getto to access values within a tuple.

This package is easily created with the std::index_sequence which represents a list of integers at compile time. You can create a helper function that will accept std::index_sequence as argument and then use std::get (_strings) to access strings from a tuple one by one from template parameter folds.

template <typename... Strings>
class QStringBuilder {
    using Tuple = std::tuple<Strings...>;
    Tuple _strings;

    template <std::size_t... Idx>
    auto concatenateHelper(std::index_sequence<Idx...>) const
    {
        const auto totalSize = (std::get<Idx>(_strings).size() + ... + 0);

        QString result;
        result.resize(totalSize);

        (result.begin() << ... << std::get<Idx>(_strings));

        return result;
    }
};

You need to create a wrapper function that creates an index sequence for the tuple and call the concatenateHelper function:

template <typename... Strings>
class QStringBuilder {
    . . .

    auto concatenate() const
    {
        return concatenateHelper(
            std::index_sequence_for<Strings...>{});
    }
};

Conclusion

This article is only about the actual string concatenation. To apply this to a real QStringBuilder, a few more things are needed, and the implementation becomes a little overwhelming to read as a blog article.

You need to be careful with operator overloading. One would need to use std::enable_iflike as the current implementation of QStringBuilder so that it works with all Qt concatenable types and doesn't mess up the global space with these operators.

It would also be useful to be able to handle temporary values passed to string concatenation in a safer way, since QStringBuilder only stores string references, which in the case of temporary strings can easily become dangling references.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Do you like it? Share on social networks!

Александр Панюшкин
  • Nov. 20, 2019, 4:10 a.m.

Добрый день. Большое спасибо за статью.
А это перевод или авторская статья?

И да, как-то не задумывался над тем, как qt небрежно относится к памяти в таких довольно тривиальных случаях...

Evgenii Legotckoi
  • Nov. 20, 2019, 4:14 a.m.
  • (edited)

Добрый день. Это перевод, в конце статьи указан источник...
Я добавлю в ближайшее время лычку "перевод" к статьям.

Comments

Only authorized users can post comments.
Please, Log in or Sign up
AD

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:50points,
  • Rating points-4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:80points,
  • Rating points4
m

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:20points,
  • Rating points-10
Last comments
Evgenii Legotckoi
Evgenii LegotckoiNov. 1, 2024, 12:37 a.m.
Django - Lesson 064. How to write a Python Markdown extension Добрый день. Да, можно. Либо через такие же плагины, либо с постобработкой через python библиотеку Beautiful Soup
A
ALO1ZEOct. 19, 2024, 6:19 p.m.
Fb3 file reader on Qt Creator Подскажите как это запустить? Я не шарю в программировании и кодинге. Скачал и установаил Qt, но куча ошибок выдается и не запустить. А очень надо fb3 переконвертировать в html
ИМ
Игорь МаксимовOct. 5, 2024, 5:51 p.m.
Django - Lesson 064. How to write a Python Markdown extension Приветствую Евгений! У меня вопрос. Можно ли вставлять свои классы в разметку редактора markdown? Допустим имея стандартную разметку: <ul> <li></li> <li></l…
d
dblas5July 5, 2024, 9:02 p.m.
QML - Lesson 016. SQLite database and the working with it in QML Qt Здравствуйте, возникает такая проблема (я новичок): ApplicationWindow неизвестный элемент. (М300) для TextField и Button аналогично. Могу предположить, что из-за более новой верси…
k
kmssrFeb. 9, 2024, 5:43 a.m.
Qt Linux - Lesson 001. Autorun Qt application under Linux как сделать автозапуск для флэтпака, который не даёт создавать файлы в ~/.config - вот это вопрос ))
Now discuss on the forum
Evgenii Legotckoi
Evgenii LegotckoiJune 25, 2024, 1:11 a.m.
добавить qlineseries в функции Я тут. Работы оень много. Отправил его в бан.
t
tonypeachey1Nov. 15, 2024, 5:04 p.m.
google domain [url=https://google.com/]domain[/url] domain [http://www.example.com link title]
NSProject
NSProjectJune 4, 2022, 1:49 p.m.
Всё ещё разбираюсь с кешем. В следствии прочтения данной статьи. Я принял для себя решение сделать кеширование свойств менеджера модели LikeDislike. И так как установка evileg_core для меня не была возможна, ибо он писался…
9
9AnonimOct. 25, 2024, 7:10 p.m.
Машина тьюринга // Начальное состояние 0 0, ,<,1 // Переход в состояние 1 при пустом символе 0,0,>,0 // Остаемся в состоянии 0, двигаясь вправо при встрече 0 0,1,>…

Follow us in social networks