Open fb2-files using Qt

html, fb2, xml, QXmlStreamReader

Content

Currently fb2 is a popular format for storing books. The fb2 file is a special case of xml. The main element of its structure, as for html, is the tag (control words). In this article, I'll show you how to create a simple fb2 file viewer. The project with the source text can be downloaded from the link. .

General information

Tags are divided between block and lowercase. Block tags are grouped in pairs from the opening tag that closes the tag between which the content is located. For example, a paragraph of text is written as

<p>Paragraph text</p>

Inside such a block pair, you can put other tags. Lowercase tags are used for objects in which nothing can be embedded. For example, a pointer to a drawing

<image l:href = “#_0.jpg”/>

contains information: 1) that a drawing needs to be inserted at the given point of the document, 2) a link to this figure. The algorithm for inserting a picture into text is explained below. Distinguish 3 types of tags simply with the help of a slash. At the line tag the slash before the closing bracket, at the closing block after the opening, at the opening block it is absent.

If you want to fully understand, study html. There is some difference between html and fb2, although in many respects they are identical. I will indicate such elements in the course of the narrative. Also note that xml, unlike html, does not use the CSS language, in our case this means that there is no indication in the fb2 file of how the text is formatted (font size and color, paragraph layout, etc.). All this we must (if desired) to implement independently.

Structure of fb2-file

The first <?xml> tag contains technical information about the format, its version, and the encoding used. The second tag covers the whole book. As a rule, in any book there are 2 parts: a description of and the main part of (as in html). Description contains the author's name, title of the book, annotation, etc. The main part contains the titles (the whole book or individual chapters), chapters / parts / sections <section> and text <p> (as in html). </p><pre><pre class="lang-html prettyprint linenums"><?xml …> <FictionBook …> <description> … </description> <body> … </body> … </FictionBook> </pre></pre><p> In addition, you can find the epigraph <epigraph> tags, the <a> link (as in html), the <image/> image and the empty <empty-line></empty-line> line (in html <br/> ). Links can be external and internal. External links as a parameter contain the source URL, internal links contain references to the elements in the text of the file (see the above image tag). Drawings contain similar internal references. </a></epigraph></p><p> After the </p> section, additional elements can be located. So in separate tags <binary> the pictures converted to the text form are placed. </binary></section>

Creating a Reader Program

We will build our program in the following way: we will read the data from the file and convert it to html, then send the generated string to the text field using the setHtml (QString) function. One little lifhack for those who want to learn html: the QTextEdit / QTextBrowser class object can display the formatted document as source text. To do this, open the form editor, click on the object 2 times and switch to the "Source" tab.

To process fb2-files, we will use the QXmlStreamReader class. To work with it, you need to connect the xml and xmlpatterns modules to the project. As an argument, it must be passed a pointer to an object of class QFile.

QFile f(name);
QXmlStreamReader sr(&f);

The opening of the file itself looks like a cycle with sequential reading of lines. We also need 3 variables

QString book;
QString imgId;
QString imgType;

book is needed to store the generated document, imgId and imgType for pasting pictures into text. The QXmlStreamReader class produces several important actions. First, it determines and installs the desired decoder. Second, it separates the tags from the content. Third, it highlights the properties of tags. We can only process the separated data. The readNext () function is used to read the data. All the fragments read to it belong to one of 5 types: StartDocument, EndDocument, StartElement, EndElement and Characters. Of these, 2 are the first to determine the beginning and end of the file, 2 are the next to read the tags and the last to receive the placeholder.

Having received StartDocument, we need to add the header line of the document html and 2 opening tags

book = "<!DOCTYPE HTML><html><body style=\"font-size:14px\">";

When EndDocument is reached, we close the tags opened at the beginning of the file

book.append("</body></html>");

The appearance of StartElement means that the opening or lowercase tag is read. Accordingly, EndElement signals the reading of the closing tag. The name of the tag is determined by calling the function sr.name (). ToString (). To control the structure of the document, we will store a list of all open tags in the thisToken object of the QStringList class. Therefore, in the case of StartElement, appends the name of the current tag to thisToken and deletes it in the case of EndElement. In addition, the opening (or lowercase) tags can contain attributes. The attribute will be read and stored in sr as an array of strings. You can access them using the sr.attributes () method. We need them to add pictures to the text. So, if a tag is found, you need to add a label to the picture in the text.

book.append("<p align=\"center\">"+sr.attributes().at(0).value().toString()+"</p>");

Then, if we find the tag, we need to save its tag and format.

imgId = sr.attributes().at(0).value().toString();
imgType = sr.attributes().at(1).value().toString();

Note that imgId is identical to the tag attribute, except for the absence of a sharp sign (#).

Now we can only put the contents in the string book. Here you can use a different set of rules. For example, ignore the description of a book

if(thisToken.contains("description"))
{
    break; // не выводим
}

or highlight the headings by color, font size and type. Let us dwell only on the pictures. To insert them, you need to form a string of type

QString image = "<img src=\"data:"
        + imgType +";base64,"
        + sr.text().toString()
        + "\"/>";

where sr.text (). toString () contains the contents of the tag. Then you should replace in our line-document the label corresponding to this figure on this line

book.replace("#"+imgId, image);

The algorithm for reading the fb2-file

    while( !sr.atEnd() )
    {
        switch( sr.readNext() )
        {
        case QXmlStreamReader::NoToken:
            qDebug() << "QXmlStreamReader::NoToken";
            break;
        case QXmlStreamReader::StartDocument:
            book = "<!DOCTYPE HTML><html><body style=\"font-size:14px\">";
            break;
        case QXmlStreamReader::EndDocument:
            book.append("</body></html>");
            break;
        case QXmlStreamReader::StartElement:
            thisToken.append( sr.name().toString() );
            if( sr.name().toString() == "image" ) // расположение рисунков
            {
                if(sr.attributes().count() > 0)
                    book.append("<p align=\"center\">"+sr.attributes().at(0).value().toString()+"</p>");
            }
            if(sr.name() == "binary") // хранилище рисунков
            {
                imgId = sr.attributes().at(0).value().toString();
                imgType = sr.attributes().at(1).value().toString();
            }
            break;
        case QXmlStreamReader::EndElement:
            if( thisToken.last() == sr.name().toString() )
                thisToken.removeLast();
            else
                qDebug() << "error token";
            break;
        case QXmlStreamReader::Characters:
            if( sr.text().toString().contains( QRegExp("[A-Z]|[a-z]|[А-Я]|[а-я]") )) // если есть текст в блоке
            {
                if(thisToken.contains("description")) // ОПИСАНИЕ КНИГИ
                {
                    break; // не выводим
                }
                if(thisToken.contains("div"))
                    break;
                if(!thisToken.contains( "binary" ))
                    book.append("<p>" + sr.text().toString() + "</p>");
            }
            if(thisToken.contains( "binary" ) )//для рисунков
            {
                QString image = "<img src=\"data:"
                        + imgType +";base64,"
                        + sr.text().toString()
                        + "\"/>";
                book.replace("#"+imgId, image);
            }
            break;
        }
    }

Our document is ready. It remains only to set the generated string in the text box

ui->textBrowser->setHtml(book);

For the full work of the fb2-reader, you need to add processing links, tables and some other objects. But the above material is sufficient to extract the main contents of the book.

We recommend hosting TIMEWEB
We recommend hosting TIMEWEB
Stable hosting, on which the social network EVILEG is located. For projects on Django we recommend VDS hosting.

Comments

Only authorized users can post comments.
Please, Log in or Sign up
How to become an author?

Contribute to the evolution of the EVILEG community.

Learn how to become a site author.

Learn it
Donate

Good day, Dear Users!!!

I am Evgenii Legotckoi, developer of EVILEG. And it is my hobby project, which helps to learn programming another programmers and developers

If the site helped you, and you want also support the development of the site, than you can donate by following ways

PayPalYandex.Money
Timeweb

Let me recommend you the excellent hosting on which EVILEG is located.

For many years, Timeweb has been proving his stability.

For projects on Django I recommend VDS hosting

View Hosting Timeweb
MN
May 25, 2020, 11:33 a.m.
Mitja Nagibin

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:50points,
  • Rating points-4
f
May 25, 2020, 5:05 a.m.
falcon

C++ - Test 001. The first program and data types

  • Result:66points,
  • Rating points-1
jm
May 25, 2020, 3:30 a.m.
just maks

C ++ - Test 004. Pointers, Arrays and Loops

  • Result:80points,
  • Rating points4
Last comments
May 26, 2020, 6:51 a.m.
Evgenij Legotskoj

Qt/C++ - Lesson 004. QSqlTableModel – How to present the table from database?

У вас база данных не открылась Исправьте путь к базе данных на свой корректный в следующих методах void DataBase::connectToDataBase() bool DataBase::openDataBase()
T1
T1
May 26, 2020, 6:22 a.m.
Tima 1

Qt/C++ - Lesson 004. QSqlTableModel – How to present the table from database?

полностью повторил структору проекта. В форму дабавил tableView. Но при запуске получаю форму только с пустым tableView. Можете подсказать в чем пробелма?
May 26, 2020, 6:02 a.m.
Evgenij Legotskoj

Qt/C++ - Lesson 004. QSqlTableModel – How to present the table from database?

Потому что это файл который нужно создать, а не библиотека. В статье есть содержание этого файла. Добавляйте в проект. Копируйте содержимое из статьи.
T1
May 26, 2020, 6 a.m.
Tima 1

Qt/C++ - Lesson 004. QSqlTableModel – How to present the table from database?

не удается подключиить библеотеку include "database.h" выдает ошибку. Можете помочь?
Now discuss on the forum
May 26, 2020, 5:16 a.m.
BlinCT

Отсутствие драйвера SQLite в пакете Qt 4 на Linux

Вот честно непонимаю почему до сих пор используют qt4, там же столько всего отсутствует, много фишек и возможностей нету там. То есть используя такое старье приходится много писать самому а не и…
DK
May 26, 2020, 2:24 a.m.
Dzhon Kofi

Disable autoscroll

такие естественные решения все перепробовал. Получилось вчера так: const int maximumScroll = ui->_samples->verticalScrollBar()->maximum();const int sliderPos = ui->_samp…
May 26, 2020, 12:43 a.m.
Ruslan Polupan

Посоветуйте новичку (базы данных и Qt, что учить)

Без БД сейчас практически никуда. Поэтому SQL надо знать. SQLite самы простой вариант, но имхо лучще начать с бд клиент-сервер. Настроить сервер. Подключаться клиентом. Просто это помогает понят…
EJ
May 25, 2020, 2:42 p.m.
Esteban José María

Компиляция пустого проекта Qt Android

qt 5.12.8 BUILD SUCCESSFUL in 42s 28 actionable tasks: 28 executed Android package built successfully in 68.251 ms. Ну, буду разбираться по-тихоньку. :)
s
May 25, 2020, 1:24 p.m.
sander-007

Использование файлов в памяти (memory file mapping)

Добрый вечер, проблемы работы с файлом Exel нет вообще. Весь смысл в том чтобы не создавать на диске физический файл (требования безопасности), дабы потом не чистить. А так вопрос только в этом …
About
Services
© EVILEG 2015-2020
Recommend hosting TIMEWEB