Finally, I managed to make a fairly fast full-text search across several models, taking into account the optimization of requests on the site, which would suit me and meet my requirements for the quality of the project.
If you now use the site search, you will find that the search will work quite quickly, and will also return several search groups: Articles, Comments, Forum Topics, Forum Replies, Tests.
From all search groups, three results will be selected, and each group will also have a counter for the total number of results found and it will be proposed to view the rest of the results in separate tabs.
As a result, I made such a division for the reason that creating a single list with all the results is rather expensive in terms of resources and not efficient enough.
So the end result looks like this:
Subsequently, this approach will allow me to more flexibly and easily modify the search system, which will make it possible to add any content to the search, regardless of other parts of the site.
And now I will tell you exactly how I did it.
Full text search in PostgreSQL
The PostgreSQL database supports full-text search, and Django allows ORM to implement it.
The easiest way to run a full text search is to use the search method on a model field, as described in the Django documentation.
For example like this
Entry.objects.filter(text__search='Cheese')
In this case, the Entry model has a text field, on which a full-text search is called using the built-in search functionality.
A more advanced way to perform a full text search is to use SearchVector search vectors across multiple fields. That is so
from django.contrib.postgres.search import SearchVector Entry.objects.annotate(search=SearchVector('text', 'tagline')).filter(search='Cheese')
Unfortunately, using the
annotate
method is inefficient, because this method sometimes takes a lot of time.
To improve performance, Django tools suggest using a special field
SearchVectorField
, which is indexed by the PostgreSQL database. This allows you to significantly speed up the search on the site.
Adding SearchVectorField and index to the model
I'll show adding SearchVectorField and indexing using the Article model as an example.
from django.contrib.postgres.indexes import GinIndex from django.contrib.postgres.search import SearchVectorField class Article(models.Model): title = models.CharField('Title', max_length=200) content = models.TextField(verbose_name='Content', blank=True) # SearchVectorField for full-text search search_vector = SearchVectorField(null=True) class Meta: indexes = [GinIndex(fields=["search_vector",]),]
As you can see, in the presented code, there is a search_vector field, which is indexed using the GinIndex specification in the Meta class of the Article model.
After you have added the SearchVectorField and the indexing of this field, create new migrations
python manage.py makemigrations
In general, the new migration will look like this
# Generated by Django 3.2 on 2023-03-27 21:03 import django.contrib.postgres.indexes import django.contrib.postgres.search from django.db import migrations class Migration(migrations.Migration): dependencies = [ # depends on ] operations = [ migrations.AddField( model_name='article', name='search_vector', field=django.contrib.postgres.search.SearchVectorField(null=True), ), migrations.AddIndex( model_name='article', index=django.contrib.postgres.indexes.GinIndex(fields=['search_vector'], name='knowledge_a_search__682520_gin'), ), ]
But this will not be enough, because you still need to fill in the SearchVectorField field, which can also be done during migration. Therefore, we modify the migration as follows
# Generated by Django 3.2 on 2023-03-27 21:03 import django.contrib.postgres.indexes import django.contrib.postgres.search from django.db import migrations def compute_search_vector(apps, schema_editor): Article = apps.get_model("knowledge", "Article") Article.objects.update(search_vector=django.contrib.postgres.search.SearchVector("title", "content")) class Migration(migrations.Migration): dependencies = [ # depends on ] operations = [ migrations.AddField( model_name='article', name='search_vector', field=django.contrib.postgres.search.SearchVectorField(null=True), ), migrations.AddIndex( model_name='article', index=django.contrib.postgres.indexes.GinIndex(fields=['search_vector'], name='knowledge_a_search__682520_gin'), ), migrations.RunPython( compute_search_vector, reverse_code=migrations.RunPython.noop ), ]
This code adds an extra step that adds executing python code to populate
SearchVectorField
immediately, namely the last step
migrations.RunPython
which runs the
compute_search_vector
function.
SearchVectorField
has an important feature, it cannot be directly added to
SearchVector
, but it can be populated using the
update
method of the model manager. That's why the code looks like this
Article.objects.update(search_vector=django.contrib.postgres.search.SearchVector("title", "content"))
Then run the migration
python manage.py migrate
Now all articles have an indexed search field. But in fact, this is not enough for the full-fledged work of the search engine, since the search field must be filled both in the case of creating a new object, and in the case of editing an old one. To do this, the Django documentation suggests consulting the PostgreSQL documentation and creating triggers. This is good and right, but what if for some reason you don’t want to go into PostgreSQL once again and manually create all the necessary triggers. How then to be? In this case, Django's signal/slot system will save us.
To do this, add the following code to the Article model
def update_search_vector(self): qs = Article.objects.filter(pk=self.pk) qs.update(search_vector=SearchVector("title", "content"))
And then connect to the signal post_save
from django.db.models.signals import post_save from django.dispatch import receiver @receiver(post_save, sender=Article) def post_save_artcile(sender, instance, created, update_fields, **kwargs): instance.update_search_vector()
Thus, each time the article object is saved, the search field will be updated.
SearchView class for searching all kinds of content
And now let's write a SearchView that would allow you to search for several types of content and give the desired result. In our case, let's say that we have articles ("Article") and comments ("Comment") on our site.
class SearchView(View): template_name = 'search/index.html' def get(self, request, *args, **kwargs): query = request.GET.get('search', None) article_results = Article.objects.filter(search_vector=query) comment_results = Comment.objects.filter(search_vector=query) return render( request=request, template_name=self.template_name, context={ 'search': query or '', 'article_results': article_results[:3], 'article_results_count': article_results.count(), 'comment_results': comment_results[:3], 'comment_results_count': comment_results.count(), } )
Please note that the get parameter search is passed in the site string, which is responsible for the search phrase. It also has a couple of tweaks to optimize database queries.
Since Django uses lazy database queries, requests are executed exactly when required. Thus the line
article_results = Article.objects.filter(search_vector=query)
only sets a general query to the database, but does not execute it, because for the initial search page we need the total number of found materials and only the first three objects from all found materials.
Therefore, only two QuerySets are passed as rendering context.
'article_results': article_results[:3], 'article_results_count': article_results.count(),
article_results[:3]
performs a
limit
operation on the found records and returns only three objects
article_results.count()
counts all found records. In fact, such additional code can significantly reduce the time it takes to execute queries, which greatly speeds up the search on the site.
Render page
Next, the page is rendered. I will not give the full code of my page, but will only show in general terms how it might look
We have a main search page, which is inherited from the general template
base.html
(This template does not make sense to consider).
But as for custom tegos, here we will stop a little longer. There are several of them here:
- append_query_to_url - tag for adding a query parameter to the url of the search page for a specific content.
- search_field - tag for rendering the search field
- found_objects - tag for rendering content results output
{% extends 'base.html' %} {% block page %} {% load search %} {% url 'search:articles' as articles_search_url %} {% url 'search:comments' as comments_search_url %} {% append_query_to_url articles_search_url as articles_search_url_with_query %} {% append_query_to_url comments_search_url as comments_search_url_with_query %} {% search_field search %} {% block search_result %} {% found_objects article_results article_results_count 'Articles' 'Show all articles' articles_search_url_with_query %} {% found_objects comment_results comment_results_count 'Comments' 'Show all comments' comments_search_url_with_query %} {% endblock %} {% endblock %}
Directory "templatetags"
In some articles, for example, in How to write a tabbar block template tag like the blocktranslate tag , I have already described the structure of this directory and what should be contained there to register template tags, so I won't dwell on it again.
So, let's look at the contents of the templatetags/search.py file
# -*- coding: utf-8 -*- from django import template from django.template.defaultfilters import urlencode register = template.Library() @register.inclusion_tag('search/search_field.html', takes_context=True) def search_field(context, value, **kwargs): context.update({'query_value': value}) return context @register.simple_tag(takes_context=True) def append_query_to_url(context, url): return '{}?search={}'.format(url, urlencode(context.get('search', ''))) @register.inclusion_tag('search/found_objects.html', takes_context=True) def found_objects(context, results, results_count, objects_title, all_search_message, search_url): context.update({ 'results': results, 'results_count': results_count, 'objects_title': objects_title, 'all_search_message': all_search_message, 'search_url': search_url }) return context
Catalog "templates/search"
Next, let's look at What are inclusion tag templates
Файл "search/found_objects.html"
In general terms, rendering a template for several of your objects from the search will look like the one below.
As you can see, the markup from bootstrap is used here, and there is also no example of rendering the object itself. I am sure that if you are reading this article, then you can write what you need on your own, since this article is a description of my experience, and not a direct guide to mindless copying.
{% load i18n %} <div class="card box-shadow m-2"> <div class="card-header">{{ objects_title }} <span class="badge badge-primary">{{ results_count }}</span> </div> {% if results|length > 0 %} {% for object in results %} {# You custom render of object #} {% endfor %} <div class="card-footer border-0"><a href="{{ search_url }}" class="btn btn-sm btn-outline-secondary">{{ all_search_message }}</a></div> {% else %} <div class="card-body"> {% trans 'Nothing found' %} </div> {% endif %} </div>
File "search/search_field.html"
<form method="get" class="my-3 px-3"> <div class="input-group bmd-form-group pt-0"> {% load i18n %} <input class="form-control" name="search" placeholder="{% trans 'Search' %}" value="{{ query_value }}" title="" type="text"> <div class="input-group-append"> <button type="submit" class="btn btn-outline-secondary mdi mdi-magnify mdi-0"></button> </div> </div> </form>
File "urls.py"
Next, add SearchView to the Django router.
# -*- coding: utf-8 -*- from django.urls import path from search import views app_name = 'search' urlpatterns = [ path('', views.SearchView.as_view(), name='index'), ]
This is how the setting for the Django full-text search page will look like, in any case, this is how the main search page will look like. But I also added search pages for individual pieces of content, which is also important. On the main search page, the user can see if it is possible to find at least something on the site according to his request, and on the detailed pages, the user will be able to see all the records found on the site.
Class "SearchViewByContent"
And now let's write a more generalized class for searching in different types of content. To do this, you can use the Generic class ListView . The search will be carried out by the field of the search vector. Also, for proper pagination, we need a pagination url. It's all there in this code.
class SearchViewByContent(ListView): template_name = 'search/search_objects.html' paginate_by = 10 def get_context_data(self, **kwargs): context = super().get_context_data(**kwargs) context.update({ 'search': self.request.GET.get('search', None) or '', 'last_question': self.get_pagination_url() }) return context def get_pagination_url(self): return self.request.get_full_path().replace(self.request.path, '') def get_queryset(self): qs = super().get_queryset() query = self.request.GET.get('search', None) return qs.filter(search_vector=query)
Page rendering
Let's use the code of an already existing template and extend its behavior. As you can see, the battery template tag django bootstrap 4 is used here. In one of the previous articles, I already described how to connect this older version of the battery. Nothing has changed since then and the article is also relevant for the latest version of django bootstrap 5.
{% extends 'search/index.html' %} {% block search_result %} <div id="object-list"> {% load bootstrap_pagination from bootstrap4 %} {% for object in object_list %} {# Render your object here #} {% empty %} <div class="card card-body mb-3">{{ not_found_message|default:_("Nothing found") }}</div> {% endfor %} {% if object_list %} <div class="mt-3">{% bootstrap_pagination object_list pages_to_show="3" url=last_question justify_content='center' %}</div> {% endif %} </div> {% endblock %}
Add routes to "urls.py" file
# -*- coding: utf-8 -*- from django.urls import path from search import views from articles.models import Article, Comment app_name = 'earch' urlpatterns = [ path('', views.IndexView.as_view(), name='index'), path('articles/', views.SearchView.as_view(queryset=Article.objects.all()), name='articles'), path('comments/', views.SearchView.as_view(queryset=Comment.objects.all()), name='comments'), ]
Thus, the search will be implemented for certain types of content on the site.
Multilingual support using modeltranslation battery
Django modeltranslation is a great package that adds multilingualism to your models, but unfortunately it is not compatible with the SearchVectorField field. To do this, you need to manually add new SearchVectorField fields for each language that is supported on the site. But it's really not that much work. And it might look like this
Model
class Article(models.Model): # Another code # Search vectors search_vector = SearchVectorField(null=True) search_vector_ru = SearchVectorField(null=True) search_vector_en = SearchVectorField(null=True) objects = ArticleManager() class Meta: ordering = ['-pub_date'] verbose_name = _('Article') verbose_name_plural = _('Articles') indexes = [ GinIndex(fields=[ "search_vector", "search_vector_ru", "search_vector_en", ]), ]
Accordingly, the new migration will need to be corrected for each new field SearchVectorField
SearchView
And the search SearchView can be fixed as follows
class IndexView(View): template_name = 'evileg_search/index.html' def get(self, request, *args, **kwargs): query = request.GET.get('search', None) current_language = get_language() article_results = Article.objects.filter( Q(**{'search_vector_{}'.format(current_language): query}) | Q(search_vector=query) ) comment_results = Comment.objects.filter(search_vector=query) return render( request=request, template_name=self.template_name, context={ 'search': query or '', 'article_results': article_results[:3], 'article_results_count': article_results.count(), 'comment_results': comment_results[:3], 'comment_results_count': comment_results.count(), } )
The same applies to a separate "View" for articles. Try to write it yourself separately and modify the query in the same way as in this View.
There is also an important point here. I search both by vector with language and without language. This is due to the fact that I may not have all translations for all languages for a particular article, and the search results should always be returned, in my opinion, even if there is no translation.
Conclusion
In this way, you can make a fairly simple search for different types of content, which, on top of everything, will be quite modifiable and supported for the future expansion of the search with other types of content on the site.
Замечательная статья где давольно подробно расписан поиск. Когда то я искал что то именно вот такое. И на самом деле вариантов использования очень и очень много.
Спасибо Евгений за статью.
Заметила, что в русском варианте сложно со stemming поиска. Добавила во view SearchQuery c config, стало отлавиливать (кот - котЫ). Буду признательная за инфу, как этого добиться другим способом.
Пока не подскажу, не вставало такой задачи. Для меня Django/Python - это хобби, а не профессиональная область
В любом случае спасибо за ориентацию. Для моего сайта подойдёт такая логика.
К своему удивлению обнаружила, что украинский язык не включён в postgres-e в full text search. Есть какие-то кустарные варианты с прикручиванием файлов с stopwords и т.д.Я наверное сделаю "наивный" поиск, если выбран украинский.
А вы сталкивались с этой проблемой? Спасибо.
А в чём это выражается? У меня поиск срабатывает по все подключённым языкам, в том числе и по украинскому.
stackoverflow
Я сделала триггеры для update полей и вектора:
Ошибка выпадает на pg_catalog.ukrainian, жалуется, что нет такого.
p.s Сделала менеджер модели, к фильрует поле на украинском (без конфигураций, иначе тоже жалуется).
Может быть пригодится тем, кто использует django-ckeditor-5 for admin + modeltranslation package. Поле для редактирования in models.py нужно прописывать как обычное текстовое. А вот в админ классе обозначать