Django - Tutoral 049. Optimizing Django Performance with a Real Project

django-silk, performance, Django

Recently I have devoted a lot of time to website optimization and now I would like to talk about it.
This article will explain the use of the select_related and prefetch_related methods in QuerySet, as well as their differences. I will also try to explain why Django is considered slow, and why this is still not the case. Of course, Django is slower in many ways than the same Flask, but at the same time in most projects the problem is not in Django itself, but rather in the absence of optimizing database queries.

Therefore, let's optimize the EVILEG website forum page . And the Django Silk battery will help us with this, which serves to measure the number of queries to the database, as well as measure their duration.

Install and configure Django Silk

Install Django Silk

pip install django-silk

Add it to INSTALLED_APPS

INSTALLED_APPS = [
    ...
    'silk'
]

and also add MIDDLEWARE

MIDDLEWARE = [
    ...
    'silk.middleware.SilkyMiddleware',
    ...
]

You also need to add urls from django-silk so that you can view request statistics.

from django.urls import path, include

urlpatterns = [
    path('silk/', include('silk.urls', namespace='silk'))
]

And the last step is applying the django-silk migration

python manage.py migrate

Note to Django Silk

Do not use Django Silk in a production server. At least with the settings shown in this article. If you already have good traffic on the site, for example 1,400 people a day, then with these settings Django Silk will simply eat all your resources. Therefore, experiment only on the development server.

Optimization Work

First, let's see how bad everything is with database queries on the forum’s main page. For clarity, let's see how this page looks.

To do this, we just need to download the page we are interested in and see the statistics of the request in Django Silk.

The first step in optimizing the EVILEG forum homepage

At the moment, all requests will be shown in debug mode.

The situation is depressing, because loading the main page of the forum accounts for:

  • 325 queries to the database
  • spent 155 ms
  • 568 ms full page

This is a very time-consuming task, especially since for each request a connection to the database is established, and then all the necessary data must still be loaded into the objects.

The resource consumption is huge. I think this is one of the reasons why many people consider Django to be slow, but in fact they just did not understand how to configure and optimize database queries.

We carry out optimization

Let's see what the original QuerySet looks like for the main page of the forum.

def get_queryset(self):
    return ForumTopic.objects.all()

As you can see, nothing complicated. It is such requests that are usually written at the very beginning of using Django ORM. And only then the questions begin, how to optimize the performance of Django.

The fact is that the initial queryset, which is required to display this page, takes only ForumTopic objects from the database, but does not take other objects that are added to ForeignKey fields of the ForumTopic data model. Therefore Django is forced to automatically load all immensely large objects when they are required. But the programmer knows what is required for each individual page and can indicate to Django all the immense objects that need to be picked up in advance with one request. Let's do it with select_related.

select_related

This method allows you to collect additional objects from other tables in one query. This will allow you to combine many queries into one and speed up the selection, as well as reduce the overhead of connecting to the database, since the number of connections is already very much reduced.

Let's try to select some data in one query using selected_related . I know that for my ForumTopic model, the following fields can be selected as related :

  • article - forum article
  • answer - reply, message that was marked as an answer in the forum thread
  • section - section in which the question was asked
  • user - user who asked this question

The initial database query can be modified as follows:

def get_queryset(self, **kwargs):
    return ForumTopic.objects.all().select_related('article', 'answer', 'section', 'user')

Then look at the result in Django Silk

Performance Improvement Using Select_related

The situation with the number of requests has become better

  • 256 queries to the database
  • spent 131 ms
  • 444 ms full page

The following figure shows a line with a new query that has 4 join operations.

As you can see, the duration of this request was 19.225 ms .

Already a good result. But I know for sure that this is not the limit. The fact is that the structure of the main page of the forum is quite complicated, and it shows the number of posts in each topic, the last message, a link to the answer of the solution, as well as a request to the user profile for answers. And here comes the turn of the prefetch_related method.

prefetch_related

prefetch_related differs in that it allows you to load not only objects that are used in ForeignKey fields of the model, but also those objects whose models have ForeignKey field on the model, which is involved in the main database request data. That is, you can load messages in a topic with a separate request. In this situation, I want to load the following fields.

  • comments - these are messages in a topic, ForumPost model
  • comments__user - foreign key on the user who left the message
  • answer___parent - ForumTopic's foreign key is the response that was marked with the topic’s resolution. Theoretically, it would be possible to pick up this object through select_related , but the query structure has become very complex, which would not allow select_related to be used efficiently. Yes, yes, the use of this method should be reasonable. Performance, of course, improves, but sometimes it’s better to collect some data with a separate request.

Then the database query already looks like this:

def get_queryset(self, **kwargs):
    return ForumTopic.objects.all().select_related('article', 'answer', 'section', 'user').prefetch_related(
        'comments', 'comments__user', 'answer___parent'
    )

And in Django Silk I get the following result

Using select_related and prefetch_related

As a result, we have the following:

  • 6 queries to the database
  • spent on 26 ms
  • 148 ms full page

This is just a great result that can be achieved. At the same time, the user already feels that the page is loading very quickly.

But this is not all, note that the request, which has 4 join operations, is still in the region of 17-20 ms. Can we do something about this? Of course we can, and for this we will need to use the only method.

only

The only method allows you to pick up only those columns that we need to display the page. But in this case, it will be necessary to take into account all the columns that are required, otherwise each missed Django column will be picked up by a separate request.

So I wrote the following database query

def get_queryset(self, **kwargs):
    return ForumTopic.objects.all().select_related('article', 'answer', 'section', 'user').prefetch_related(
        'comments', 'comments__user', 'answer___parent'
    ).only(
        'user__first_name', 'user__last_name', 'section__title', 'section__title_ru', 'article__title',
        'article__title_ru'
    )

And got the following result

  • 6 queries to the database
  • spent on 20 ms
  • 136 ms full page

Конечно, я привожу лучшие возможные результаты, поскольку всегда имеются некоторые колебания в измерениях, но по данному скриншоту видно, что длительность основного запроса снизилась с 17-19 мс до 11-13 мс . Помимо прочего выборка только нужных полей снижает и потребление памяти, если из базы данных забираются например очень крупные куски текстовых данных, которые при этом не используются в рендеринге страницы.

Now let's play around a bit with query select_related and prefetch_related

Additional optimization

Having read up to this point, you, I think, were convinced that use select_related allows to optimize database queries very coolly. But there is one BUT . Some problems may arise in using the Paginator class, which is used on my page. And the fact is that for Paginator it is necessary to execute the count request in order to calculate the correct number of pages. And if the request is very complicated, then the duration of the count request can be quite large and commensurate with the execution of a regular request. Therefore, an important condition may be writing a quick and effective main request, and all other objects will be better loaded using prefetch_related . That is, you may have a situation where it is better to complete a couple of additional requests, through overloading join operations with the main request.

And I wrote such a request to ORM for this page

def get_queryset(self, **kwargs):
    return ForumTopic.objects.all().select_related('answer').prefetch_related(
        Prefetch('article', queryset=Article.objects.all().only('title', 'title_ru')),
        Prefetch('section', queryset=ForumSection.objects.all().only('slug', 'title', 'title_ru')),
        Prefetch('user', queryset=User.objects.all().only('username', 'first_name', 'last_name')),
        Prefetch('comments', queryset=ForumPost.objects.all().select_related('user').only(
            'user__username', 'user__first_name', 'user__last_name', '_parent_id'
        )),
        Prefetch('answer___parent', queryset=ForumTopic.objects.all().only('id'))
    ).only(
        'title', 'user_id', 'section_id', 'article_id', 'answer___parent_id', 'pub_date', 'lastmod', 'attachment'
    )

At the same time, I got the following performance result

  • 8 queries to the database
  • spent 14 ms
  • 141 ms full page

Of course, you can say that in this case there is not a very big gain. Moreover, the overall download speed even dropped a little (5 ms), and there were 2 more requests to the database, but at the same time I got an increase in query performance by 42 percent , and this is already something worth it. Thus, if your site has very long queries that are used in pagination and have a large number of join operations, then it may be worth rewriting the use of select_related to prefetch_related . This can actually help make your Django site much faster.

Conclusion

  • Use select_related to select corresponding fields from other tables simultaneously with the main query
  • Use prefetch_related to additionally load with a single request all objects of other models that have ForeignKey on your main queryset
  • Use only to limit the columns to be taken, it will also speed up queries and reduce memory consumption
  • If you use Paginator , then make sure that the main request does not generate a very heavy request count , otherwise, it is possible that some select_related requests are loaded as prefetch_related
We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.
- company blog
Support the author Donate

Спасибо. Хорошая статья.

Я нашёл 2 опечатки. Выделил жирным.

prefetch_related
prefetch_related отличается тем, что позволяет подгрузить не только объекты, которые используются в ForeignKey полях модели, но и те объекты, модели...
Должно вроде быть "но и те ".

Дополнительная оптимизация
...То есть у вас может быть ситуация, когда лучше выполнить ещё пару дополнительных запросов, через перегружать join операциями основной запрос.
Тут видимо имелось ввиду чем .

Спасибо, поправил

Comments

Only authorized users can post comments.
Please, Log in or Sign up
Donate

Hello, Dear Users of EVILEG!!!

If the site helped you, then support the development of the site financially, please.

You can do it by following ways:

Thank you, Evgenii Legotckoi

Nov. 8, 2019, 7:59 a.m.
Pavel.K

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:60points,
  • Rating points-1
RF
Nov. 7, 2019, 12:51 p.m.
Roman Figura

C++ - Тест 003. Условия и циклы

  • Result:50points,
  • Rating points-4
RF
Nov. 7, 2019, 12:44 p.m.
Roman Figura

C++ - Test 002. Constants

  • Result:25points,
  • Rating points-10
Last comments
b
Nov. 9, 2019, 7:28 a.m.
bastonc

спасибо ещё раз. огромное, за уделённое время
b
Nov. 9, 2019, 7:24 a.m.
bastonc

Спасибо Вам большое. Буду изучать.
Nov. 9, 2019, 4:58 a.m.
Evgenij Legotskoj

Добрый день. По первым двум вопросам вы найдёте ответ в этой статье - PyQt5 - Урок 008. Работа с QTableWidget (Обновление урока 006) Что касается последнего вопроса, то я вам…
Nov. 9, 2019, 1:50 a.m.
Evgenij Legotskoj

Как и обещал, вы можете посмотреть новую статью QML - Урок 037. Кастомизация кнопок в QML (Обновление урока 002) . Там же найдёте ссылку на Git репозиторий. Не забудьте поставить звёз…
b
Nov. 8, 2019, 6:40 a.m.
bastonc

Приветствую. Подскажите пожалуйста пару моментов. 1. Как сделать столбец не редактируемый, а остальные ячейки остаются редактируемыми 2. Как оталвливать события двойного клика для реда…
Now discuss on the forum
AV
Nov. 11, 2019, 10:15 p.m.
Alexey Vasin

сейчас компа под рукой нет, так ты найдешь входит ли оди вектор в другой C++Выделит#include <algorithm>#include <iostream>#include <vector>using namespace std;int m…
r
Nov. 11, 2019, 4:57 a.m.
rbw123

buttonText скорее всего не видит потому, что он находится внутри ButtonStyle. А как тогда обращаться к свойствам?
Nov. 10, 2019, 5:53 a.m.
Evgenij Legotskoj

Я имел ввиду дополнительные параметры сортировки, кроме тех, что уже присутствуют в расширенном поиске.
c
Nov. 8, 2019, 10:06 a.m.
cappelikan

возникла задача реализовать парсинг html библиотекой htmlcxx и вывода href ссылок ввиде списка с помощью qlistview как это грамотно сделать ? спасибо
L
Nov. 7, 2019, 3:08 p.m.
LastLeaf

Спасибо, все получилось! Дай бог тебе здоровья!
EVILEG
About
Services
© EVILEG 2015-2019
Recommend hosting TIMEWEB