Evgenii Legotckoi
Evgenii LegotckoiMarch 31, 2023, 4:32 p.m.

Django - Lesson 063. Full text site search for multiple models with multilingual support

Finally, I managed to make a fairly fast full-text search across several models, taking into account the optimization of requests on the site, which would suit me and meet my requirements for the quality of the project.

If you now use the site search, you will find that the search will work quite quickly, and will also return several search groups: Articles, Comments, Forum Topics, Forum Replies, Tests.
From all search groups, three results will be selected, and each group will also have a counter for the total number of results found and it will be proposed to view the rest of the results in separate tabs.

As a result, I made such a division for the reason that creating a single list with all the results is rather expensive in terms of resources and not efficient enough.

So the end result looks like this:

Subsequently, this approach will allow me to more flexibly and easily modify the search system, which will make it possible to add any content to the search, regardless of other parts of the site.

And now I will tell you exactly how I did it.

Full text search in PostgreSQL

The PostgreSQL database supports full-text search, and Django allows ORM to implement it.

The easiest way to run a full text search is to use the search method on a model field, as described in the Django documentation.

For example like this

Entry.objects.filter(text__search='Cheese')

In this case, the Entry model has a text field, on which a full-text search is called using the built-in search functionality.

A more advanced way to perform a full text search is to use SearchVector search vectors across multiple fields. That is so

from django.contrib.postgres.search import SearchVector

Entry.objects.annotate(search=SearchVector('text', 'tagline')).filter(search='Cheese')

Unfortunately, using the annotate method is inefficient, because this method sometimes takes a lot of time.
To improve performance, Django tools suggest using a special field SearchVectorField , which is indexed by the PostgreSQL database. This allows you to significantly speed up the search on the site.

Adding SearchVectorField and index to the model

I'll show adding SearchVectorField and indexing using the Article model as an example.

from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField

class Article(models.Model):
    title = models.CharField('Title', max_length=200)
    content = models.TextField(verbose_name='Content', blank=True)

    # SearchVectorField for full-text search
    search_vector = SearchVectorField(null=True)

    class Meta:
        indexes = [GinIndex(fields=["search_vector",]),]

As you can see, in the presented code, there is a search_vector field, which is indexed using the GinIndex specification in the Meta class of the Article model.

After you have added the SearchVectorField and the indexing of this field, create new migrations

python manage.py makemigrations

In general, the new migration will look like this

# Generated by Django 3.2 on 2023-03-27 21:03

import django.contrib.postgres.indexes
import django.contrib.postgres.search
from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
        # depends on 
    ]

    operations = [
        migrations.AddField(
            model_name='article',
            name='search_vector',
            field=django.contrib.postgres.search.SearchVectorField(null=True),
        ),
        migrations.AddIndex(
            model_name='article',
            index=django.contrib.postgres.indexes.GinIndex(fields=['search_vector'], name='knowledge_a_search__682520_gin'),
        ),
    ]

But this will not be enough, because you still need to fill in the SearchVectorField field, which can also be done during migration. Therefore, we modify the migration as follows

# Generated by Django 3.2 on 2023-03-27 21:03

import django.contrib.postgres.indexes
import django.contrib.postgres.search
from django.db import migrations


def compute_search_vector(apps, schema_editor):
    Article = apps.get_model("knowledge", "Article")
    Article.objects.update(search_vector=django.contrib.postgres.search.SearchVector("title", "content"))


class Migration(migrations.Migration):

    dependencies = [
        # depends on 
    ]

    operations = [
        migrations.AddField(
            model_name='article',
            name='search_vector',
            field=django.contrib.postgres.search.SearchVectorField(null=True),
        ),
        migrations.AddIndex(
            model_name='article',
            index=django.contrib.postgres.indexes.GinIndex(fields=['search_vector'], name='knowledge_a_search__682520_gin'),
        ),
        migrations.RunPython(
            compute_search_vector, reverse_code=migrations.RunPython.noop
        ),
    ]

This code adds an extra step that adds executing python code to populate SearchVectorField immediately, namely the last step migrations.RunPython which runs the compute_search_vector function.
SearchVectorField has an important feature, it cannot be directly added to SearchVector , but it can be populated using the update method of the model manager. That's why the code looks like this

Article.objects.update(search_vector=django.contrib.postgres.search.SearchVector("title", "content"))

Then run the migration

python manage.py migrate

Now all articles have an indexed search field. But in fact, this is not enough for the full-fledged work of the search engine, since the search field must be filled both in the case of creating a new object, and in the case of editing an old one. To do this, the Django documentation suggests consulting the PostgreSQL documentation and creating triggers. This is good and right, but what if for some reason you don’t want to go into PostgreSQL once again and manually create all the necessary triggers. How then to be? In this case, Django's signal/slot system will save us.

To do this, add the following code to the Article model

    def update_search_vector(self):
        qs = Article.objects.filter(pk=self.pk)
        qs.update(search_vector=SearchVector("title", "content"))

And then connect to the signal post_save

from django.db.models.signals import post_save
from django.dispatch import receiver

@receiver(post_save, sender=Article)
def post_save_artcile(sender, instance, created, update_fields, **kwargs):
    instance.update_search_vector()

Thus, each time the article object is saved, the search field will be updated.

SearchView class for searching all kinds of content

And now let's write a SearchView that would allow you to search for several types of content and give the desired result. In our case, let's say that we have articles ("Article") and comments ("Comment") on our site.

class SearchView(View):
    template_name = 'search/index.html'

    def get(self, request, *args, **kwargs):
        query = request.GET.get('search', None)
            article_results = Article.objects.filter(search_vector=query)
            comment_results = Comment.objects.filter(search_vector=query)

        return render(
            request=request,
            template_name=self.template_name,
            context={
                'search': query or '',
                'article_results': article_results[:3],
                'article_results_count': article_results.count(),
                'comment_results': comment_results[:3],
                'comment_results_count': comment_results.count(),
            }
        )

Please note that the get parameter search is passed in the site string, which is responsible for the search phrase. It also has a couple of tweaks to optimize database queries.

Since Django uses lazy database queries, requests are executed exactly when required. Thus the line

article_results = Article.objects.filter(search_vector=query)

only sets a general query to the database, but does not execute it, because for the initial search page we need the total number of found materials and only the first three objects from all found materials.
Therefore, only two QuerySets are passed as rendering context.

    'article_results': article_results[:3],
    'article_results_count': article_results.count(),

article_results[:3] performs a limit operation on the found records and returns only three objects
article_results.count() counts all found records. In fact, such additional code can significantly reduce the time it takes to execute queries, which greatly speeds up the search on the site.

Render page

Next, the page is rendered. I will not give the full code of my page, but will only show in general terms how it might look

We have a main search page, which is inherited from the general template base.html (This template does not make sense to consider).
But as for custom tegos, here we will stop a little longer. There are several of them here:

  • append_query_to_url - tag for adding a query parameter to the url of the search page for a specific content.
  • search_field - tag for rendering the search field
  • found_objects - tag for rendering content results output
{% extends 'base.html' %}
{% block page %}
  {% load search %}
  {% url 'search:articles' as articles_search_url %}
  {% url 'search:comments' as comments_search_url %}

  {% append_query_to_url articles_search_url as articles_search_url_with_query %}
  {% append_query_to_url comments_search_url as comments_search_url_with_query %}

  {% search_field search %}

  {% block search_result %}
    {% found_objects article_results article_results_count 'Articles' 'Show all articles' articles_search_url_with_query  %}
    {% found_objects comment_results comment_results_count 'Comments' 'Show all comments' comments_search_url_with_query  %}
  {% endblock %}
{% endblock %}

Directory "templatetags"

In some articles, for example, in How to write a tabbar block template tag like the blocktranslate tag , I have already described the structure of this directory and what should be contained there to register template tags, so I won't dwell on it again.

So, let's look at the contents of the templatetags/search.py file

# -*- coding: utf-8 -*-

from django import template
from django.template.defaultfilters import urlencode

register = template.Library()

@register.inclusion_tag('search/search_field.html', takes_context=True)
def search_field(context, value, **kwargs):
    context.update({'query_value': value})
    return context


@register.simple_tag(takes_context=True)
def append_query_to_url(context, url):
    return '{}?search={}'.format(url, urlencode(context.get('search', '')))

@register.inclusion_tag('search/found_objects.html', takes_context=True)
def found_objects(context, results, results_count, objects_title, all_search_message, search_url):
    context.update({
        'results': results,
        'results_count': results_count,
        'objects_title': objects_title,
        'all_search_message': all_search_message,
        'search_url': search_url
    })
    return context

Catalog "templates/search"

Next, let's look at What are inclusion tag templates

Файл "search/found_objects.html"

In general terms, rendering a template for several of your objects from the search will look like the one below.
As you can see, the markup from bootstrap is used here, and there is also no example of rendering the object itself. I am sure that if you are reading this article, then you can write what you need on your own, since this article is a description of my experience, and not a direct guide to mindless copying.

{% load i18n %}
<div class="card box-shadow m-2">
  <div class="card-header">{{ objects_title }} <span class="badge badge-primary">{{ results_count }}</span>
  </div>
    {% if results|length > 0 %}
      {% for object in results %}
        {# You custom render of object #}
      {% endfor %}
      <div class="card-footer border-0"><a href="{{ search_url }}" class="btn btn-sm btn-outline-secondary">{{ all_search_message }}</a></div>
    {% else %}
      <div class="card-body">
        {% trans 'Nothing found' %}
      </div>
    {% endif %}
</div>

File "search/search_field.html"

<form method="get" class="my-3 px-3">
  <div class="input-group bmd-form-group pt-0">
  {% load i18n %}
    <input class="form-control" name="search" placeholder="{% trans 'Search' %}" value="{{ query_value }}" title="" type="text">
    <div class="input-group-append">
      <button type="submit" class="btn btn-outline-secondary mdi mdi-magnify mdi-0"></button>
    </div>
  </div>
</form>

File "urls.py"

Next, add SearchView to the Django router.

# -*- coding: utf-8 -*-

from django.urls import path

from search import views

app_name = 'search'
urlpatterns = [
    path('', views.SearchView.as_view(), name='index'),
]

This is how the setting for the Django full-text search page will look like, in any case, this is how the main search page will look like. But I also added search pages for individual pieces of content, which is also important. On the main search page, the user can see if it is possible to find at least something on the site according to his request, and on the detailed pages, the user will be able to see all the records found on the site.

Class "SearchViewByContent"

And now let's write a more generalized class for searching in different types of content. To do this, you can use the Generic class ListView . The search will be carried out by the field of the search vector. Also, for proper pagination, we need a pagination url. It's all there in this code.

class SearchViewByContent(ListView):
    template_name = 'search/search_objects.html'
    paginate_by = 10

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        context.update({
            'search': self.request.GET.get('search', None) or '',
            'last_question': self.get_pagination_url()
        })
        return context

    def get_pagination_url(self):
        return self.request.get_full_path().replace(self.request.path, '')

    def get_queryset(self):
        qs = super().get_queryset()
        query = self.request.GET.get('search', None)
        return qs.filter(search_vector=query)

Page rendering

Let's use the code of an already existing template and extend its behavior. As you can see, the battery template tag django bootstrap 4 is used here. In one of the previous articles, I already described how to connect this older version of the battery. Nothing has changed since then and the article is also relevant for the latest version of django bootstrap 5.

{% extends 'search/index.html' %}
{% block search_result %}
  <div id="object-list">
  {% load bootstrap_pagination from bootstrap4 %}
  {% for object in object_list %}
    {# Render your object here #}
  {% empty %}
    <div class="card card-body mb-3">{{ not_found_message|default:_("Nothing found") }}</div>
  {% endfor %}
  {% if object_list %}
    <div class="mt-3">{% bootstrap_pagination object_list pages_to_show="3" url=last_question justify_content='center' %}</div>
  {% endif %}
  </div>
{% endblock %}

Add routes to "urls.py" file

# -*- coding: utf-8 -*-

from django.urls import path

from search import views
from articles.models import Article, Comment

app_name = 'earch'
urlpatterns = [
    path('', views.IndexView.as_view(), name='index'),
    path('articles/', views.SearchView.as_view(queryset=Article.objects.all()), name='articles'),
    path('comments/', views.SearchView.as_view(queryset=Comment.objects.all()), name='comments'),
]

Thus, the search will be implemented for certain types of content on the site.

Multilingual support using modeltranslation battery

Django modeltranslation is a great package that adds multilingualism to your models, but unfortunately it is not compatible with the SearchVectorField field. To do this, you need to manually add new SearchVectorField fields for each language that is supported on the site. But it's really not that much work. And it might look like this

Model

class Article(models.Model):

    # Another code

    # Search vectors
    search_vector = SearchVectorField(null=True)
    search_vector_ru = SearchVectorField(null=True)
    search_vector_en = SearchVectorField(null=True)

    objects = ArticleManager()

    class Meta:
        ordering = ['-pub_date']
        verbose_name = _('Article')
        verbose_name_plural = _('Articles')
        indexes = [
            GinIndex(fields=[
                "search_vector", "search_vector_ru", "search_vector_en",
            ]),
        ]

Accordingly, the new migration will need to be corrected for each new field SearchVectorField

SearchView

And the search SearchView can be fixed as follows

class IndexView(View):
    template_name = 'evileg_search/index.html'

    def get(self, request, *args, **kwargs):
        query = request.GET.get('search', None)
        current_language = get_language()
        article_results = Article.objects.filter(
            Q(**{'search_vector_{}'.format(current_language): query}) |
            Q(search_vector=query)
        )
        comment_results = Comment.objects.filter(search_vector=query)

        return render(
            request=request,
            template_name=self.template_name,
            context={
                'search': query or '',
                'article_results': article_results[:3],
                'article_results_count': article_results.count(),
                'comment_results': comment_results[:3],
                'comment_results_count': comment_results.count(),
            }
        )

The same applies to a separate "View" for articles. Try to write it yourself separately and modify the query in the same way as in this View.

There is also an important point here. I search both by vector with language and without language. This is due to the fact that I may not have all translations for all languages for a particular article, and the search results should always be returned, in my opinion, even if there is no translation.

Conclusion

In this way, you can make a fairly simple search for different types of content, which, on top of everything, will be quite modifiable and supported for the future expansion of the search with other types of content on the site.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Do you like it? Share on social networks!

NSProject
  • April 6, 2023, 10:25 p.m.

Замечательная статья где давольно подробно расписан поиск. Когда то я искал что то именно вот такое. И на самом деле вариантов использования очень и очень много.
Спасибо Евгений за статью.

Lissa
  • April 19, 2023, 9:02 p.m.

Заметила, что в русском варианте сложно со stemming поиска. Добавила во view SearchQuery c config, стало отлавиливать (кот - котЫ). Буду признательная за инфу, как этого добиться другим способом.

Evgenii Legotckoi
  • April 20, 2023, 12:35 a.m.

Пока не подскажу, не вставало такой задачи. Для меня Django/Python - это хобби, а не профессиональная область

Lissa
  • April 20, 2023, 1:47 a.m.

В любом случае спасибо за ориентацию. Для моего сайта подойдёт такая логика.

Lissa
  • April 24, 2023, 3:34 p.m.
  • (edited)

К своему удивлению обнаружила, что украинский язык не включён в postgres-e в full text search. Есть какие-то кустарные варианты с прикручиванием файлов с stopwords и т.д.Я наверное сделаю "наивный" поиск, если выбран украинский.
А вы сталкивались с этой проблемой? Спасибо.

Evgenii Legotckoi
  • April 24, 2023, 3:43 p.m.

А в чём это выражается? У меня поиск срабатывает по все подключённым языкам, в том числе и по украинскому.

Lissa
  • April 24, 2023, 5:08 p.m.
  • (edited)

stackoverflow
Я сделала триггеры для update полей и вектора:

# Generated by Django 4.1 on 2023-04-23 20:45

import django.contrib.postgres.search
from django.contrib.postgres.search import SearchVector
from django.db import migrations


def compute_search_vector_uk(apps, schema_editor):
    Post = apps.get_model("posts", "Post")
    vector = SearchVector("title_uk", weight="A", config="ukrainian") + SearchVector(
        "content_uk",
        weight="B",
        config="ukrainian"

    )
    Post.objects.update(vector_uk=vector)


class Migration(migrations.Migration):

    dependencies = [
        ("posts", "0005_post_vector"),
    ]

    operations = [
        migrations.AddField(
            model_name="post",
            name="vector_uk",
            field=django.contrib.postgres.search.SearchVectorField(
                blank=True, null=True
            ),
        ),
        migrations.RunSQL(
            sql="""
            CREATE TRIGGER vector_uk_trigger
            BEFORE INSERT OR UPDATE OF title_uk, content_uk, vector_uk
            ON posts_post
            FOR EACH ROW EXECUTE PROCEDURE
            tsvector_update_trigger(
                vector_uk, 'pg_catalog.ukrainian', title_uk, content_uk
            );
            UPDATE posts_post SET vector_uk = NULL;
            """,
            reverse_sql="""
            DROP TRIGGER IF EXISTS vector_uk_trigger
            ON posts_post;
            """,
        ),
        migrations.RunPython(
            compute_search_vector_uk, reverse_code=migrations.RunPython.noop
        ),
    ]

Ошибка выпадает на pg_catalog.ukrainian, жалуется, что нет такого.
p.s Сделала менеджер модели, к фильрует поле на украинском (без конфигураций, иначе тоже жалуется).

Lissa
  • April 29, 2023, 3:01 a.m.
  • (edited)

Может быть пригодится тем, кто использует django-ckeditor-5 for admin + modeltranslation package. Поле для редактирования in models.py нужно прописывать как обычное текстовое. А вот в админ классе обозначать

formfield_overrides = {
        models.TextField: {"widget": CKEditor5Widget},
    }

Comments

Only authorized users can post comments.
Please, Log in or Sign up
B

C++ - Test 002. Constants

  • Result:16points,
  • Rating points-10
B

C++ - Test 001. The first program and data types

  • Result:46points,
  • Rating points-6
FL

C++ - Test 006. Enumerations

  • Result:80points,
  • Rating points4
Last comments
k
kmssrFeb. 9, 2024, 7:43 a.m.
Qt Linux - Lesson 001. Autorun Qt application under Linux как сделать автозапуск для флэтпака, который не даёт создавать файлы в ~/.config - вот это вопрос ))
Qt WinAPI - Lesson 007. Working with ICMP Ping in Qt Без строки #include <QRegularExpressionValidator> в заголовочном файле не работает валидатор.
EVA
EVADec. 25, 2023, 11:30 p.m.
Boost - static linking in CMake project under Windows Ошибка LNK1104 часто возникает, когда компоновщик не может найти или открыть файл библиотеки. В вашем случае, это файл libboost_locale-vc142-mt-gd-x64-1_74.lib из библиотеки Boost для C+…
J
JonnyJoDec. 25, 2023, 9:38 p.m.
Boost - static linking in CMake project under Windows Сделал всё по-как у вас, но выдаёт ошибку [build] LINK : fatal error LNK1104: не удается открыть файл "libboost_locale-vc142-mt-gd-x64-1_74.lib" Хоть убей, не могу понять в чём дел…
G
GvozdikDec. 19, 2023, 10:01 a.m.
Qt/C++ - Lesson 056. Connecting the Boost library in Qt for MinGW and MSVC compilers Для решения твой проблемы добавь в файл .pro строчку "LIBS += -lws2_32" она решит проблему , лично мне помогло.
Now discuss on the forum
AC
Alexandru CodreanuJan. 20, 2024, 12:57 a.m.
QML Обнулить значения SpinBox Доброго времени суток, не могу разобраться с обнулением значение SpinBox находящего в делегате. import QtQuickimport QtQuick.ControlsWindow { width: 640 height: 480 visible: tr…
BlinCT
BlinCTDec. 27, 2023, 9:57 p.m.
Растягивать Image на парент по высоте Ну и само собою дял включения scrollbar надо чтобы был Flickable. Так что выходит как то так Flickable{ id: root anchors.fill: parent clip: true property url linkFile p…
Дмитрий
ДмитрийJan. 10, 2024, 5:18 p.m.
Qt Creator загружает всю оперативную память Проблема решена. Удалось разобраться с помощью утилиты strace. Запустил ее: strace ./qtcreator Начал выводиться весь лог работы креатора. В один момент он начал считывать фай…
Evgenii Legotckoi
Evgenii LegotckoiDec. 12, 2023, 7:48 p.m.
Побуквенное сравнение двух строк Добрый день. Там случайно не высылается этот сигнал textChanged ещё и при форматировани текста? Если решиать в лоб, то можно просто отключать сигнал/слотовое соединение внутри слота и …

Follow us in social networks