Use SQLite to store torrents and fastresumes #10099

sledgehammer999 · 2019-01-02T19:53:51Z

I recently played with SQLite and I was fascinated by it. Then I realized how easy things could become for us if we used to save stuff.

Pros:

Far easier code for sorting torrents by queue during startup
Possibly much faster startup for users with many torrents because now we don't have to read multitudes of small files from disk
SQLite should have a very robust anti-corruption systems. Far better than anything we can engineer ourselves

I plugged in the new functionality by keeping the current design as much as possible. I didn't attempt to engineer a better code architecture around the new system.
Most of the code you see is old code moved around.

PS: Is anyone more knowledgeable with SQL? I wonder if we should split the table into 2 for performance reasons. One saving hash+metadata and one saving hash+fastresume+queue.

sledgehammer999 · 2019-01-02T19:55:26Z

Special shoutout to DB Browser for SQLite. It is a great lightweight that lets you inspect/modify/administer and SQLite db. Very useful for debugging.

glassez · 2019-01-03T04:45:22Z

@sledgehammer999, that's fine!
I'm glad my old idea finally sprung. This will undoubtedly be a step forward. Although there is still work to be done, starting from what and how we should store in the tables, and ending with the details of the implementation. I'll start commenting on specific things later.

src/app/app.pri

src/app/CMakeLists.txt

src/CMakeLists.txt

src/src.pro

src/base/utils/db.h

src/base/bittorrent/torrenthandle.h

src/base/bittorrent/private/resumedatasavingmanager.cpp

glassez · 2019-01-03T16:58:42Z

src/base/utils/db.h

+
+#pragma once
+
+namespace Utils


There are no database related utilities in this file. It's just database configuration for some particular application component (BitTorrent). So it should be under "base/bittorrent".

glassez · 2019-01-03T17:34:35Z

I plugged in the new functionality by keeping the current design as much as possible.

Too much.
Since we stored some additional data along with the "fastresume" data, we had to request the creation of "fastresume" data each time we had anything changed (e.g. save path, torrent name, etc.), then we injecte these additional fields in "fastresume" data, encode it and write to file.
We don't have to do it that way anymore. We can store them in separate columns and update them independently.
Even more. If we look at the libtorrent "fastresume" data, we can see that most of its fields has the behavior described above. E.g. when torrent is paused or resumed we receive an appropriate alert so we can just update "paused" and "auto_managed" columns and so on.
By and large, we need to request "fastresume" data generation just to store the current progress.
This may seem somewhat inconvenient due to libtorrent drawback (it requires "fastresume" data to be bencoded when we pass it to add_torrent_params). But this is slightly improved in the libtorrent-1.2 (now we just have to fill in the appropriate add_torrent_params fields).

sledgehammer999 · 2019-01-03T22:58:21Z

I am all for decoupling our data from libtorrent's fastresume.
But this will need investigation on when to save our data and what triggers it. Aka reacting more closely to the triggering event.
This is work for the weekend though.

glassez · 2019-01-04T07:46:01Z

But this will need investigation on when to save our data and what triggers it.

At least all current "fastresume" saving events that are not fired by timer.

src/base/bittorrent/session.cpp

glassez · 2019-01-04T10:00:20Z

src/base/bittorrent/session.cpp

@@ -535,7 +538,8 @@ Session::Session(QObject *parent)
    connect(&m_networkManager, &QNetworkConfigurationManager::configurationChanged, this, &Session::networkConfigurationChange);

    m_ioThread = new QThread(this);
-    m_resumeDataSavingManager = new ResumeDataSavingManager {m_resumeFolderPath};
+    const QDir resumeDataDir(m_resumeFolderPath);
+    m_resumeDataSavingManager = new ResumeDataSavingManager{resumeDataDir.absoluteFilePath(QLatin1String {Utils::DB::FILENAME})};


Do you really want to have database in BT_backup folder? Why? Not only it has the inappropriate name (we kept it for compatibility purposes only), a separate folder is not needed at all now.

A reason I did it is because we have the lock file there which kinda guarantees that we have permission to read/write in that folder. Which in turn will make exporting the torrents back to the all system more manageable.
I open to suggestions for a new location. Also for a better name for the DB/table.

As I mentioned earlier (in other words), you should not interfere any extra logic with the basic one anywhere. The main way is to work with the new "saving system". That is, if we simply discard all additional import/export logic, then the main one should not be affected. The same applies to all user visible manifestations of the application.
Your current code assumes that user had previous qBittorrent version before. It's incorrect! You should check for it first and then select what you should do.

I open to suggestions for a new location.

The same folder where legacy BT_backup were placed.

Also for a better name for the DB/table.

Just something more specific than "data." E.g. "torrents".

A reason I did it is because we have the lock file there which kinda guarantees that we have permission to read/write in that folder.

"Lock file" logic should be dropped at all. It's meaningless in case of new "saving system". In addition, as it turned out, it is inefficient on *nix systems.

glassez · 2019-01-06T12:29:00Z

src/app/upgrade.cpp

+
+        // Table doesn't exist. Probably 1st run.
+        // Input PRAGMAs and create table
+        query.exec(QLatin1String("PRAGMA auto_vacuum = FULL"));


Are you sure you really want it?

Yes. It shrinks the file small when removing torrents. Why should we use it?

It shrinks the file small when removing torrents.

Is that really necessary? The empty pages can be reused further. Additionally the doc says:

However, using auto_vacuum can lead to extra database file fragmentation. And auto_vacuum does not compact partially filled pages of the database

Maybe it's better to use VACUUM command from time to time or allow user to perform it?

src/app/upgrade.cpp

glassez · 2019-01-06T12:52:41Z

src/app/upgrade.cpp

+        libt::bencode(std::back_inserter(preparedResumeData), resumeData);
+    }
+
+    // This should be moved back into Session::initResumeFolder()


Since this upgrade applies entirely to BitTorrent component, I would not extract this code deliberately into a separate file. This breaks the logic of the code structure.

We need to do this check before trying to do any kind of upgrade. The upgrade code assumes that the db exists and is accessible and that the table exists in it.
The upgrade code is called before the Session is constructed.

I'm surprised at this answer. I even thought for a moment that I was debating with another person...

The upgrade code assumes that the db exists and is accessible and that the table exists in it.

This does not contradict the fact that this code should be in a different place.

The upgrade code is called before the Session is constructed.

It's incorrect. At least it's done incorrectly. You ripped main Session code as well.

Main Session logic should be the following:

If database doesn't exist create it

Load torrents from database and go ahead.

When you add upgrade logic to it:

If database doesn't exist create it

If older version was used before this run try to import torrents from old "saving system"

Load torrents from database and go ahead.

All this logic should be done in Session code!

One thing that should be done before the Session is created is asking the user for accept the upgrade. To do this, we do not need to know anything about the existence of the database. All you need to know is whether the old qBittorrent version was used just before the current application run.

My main idea is "not to break the basic logic of the application for the sake of some additional".

glassez · 2019-01-06T12:59:11Z

src/app/upgrade.cpp

+
+        // Check if our table exists
+        QSqlQuery query(db);
+        if (query.exec(QString("SELECT name FROM sqlite_master WHERE type='table' AND name='%1'").arg(TABLE_NAME))) {


It can be done easier:

if (db.tables().contains(TABLE_NAME)) return;

glassez · 2019-01-06T13:14:14Z

src/app/upgrade.cpp

+    const QString backupFolderPath = Utils::Fs::expandPathAbs(specialFolderLocation(SpecialFolder::Data) + "BT_backup");
+    const QDir backupFolderDir(backupFolderPath);
+    const QString dbPath = backupFolderDir.absoluteFilePath(QLatin1String{FILENAME});
+    // If db exists don't try to upgrade


Invalid logic!
A user who has never used older qBittorrent versions should not care about the old saving system.

We need to implement more advanced version control. For example, store the current version somewhere in the application data. Then newer versions will be able to know more precisely what needs to be updated, and older versions will be able to warn the user about possible conflicts as a result of downgrade. In addition, it will avoid conflicts as a result of multiple upgrades/downgrades, when it is difficult to estimate based on some artifacts and we will know exactly which version was installed before (e.g. when both fastresumes and database exist).

I think you misunderstood it here.
At the current moment this tries to updrade from the non-db way to the db-way. So if a db exists we don't try to migrate older files (.torrents+.fastresumes). It is assumed that the upgrade was done already and the user dropped new files at a later date.
Unless you suggest that we should always input into the db any kind of .torrent/.fastresume we find during startup.

Also in case you missed it: Each row has a version column. During startup if the version number is different from the supported one then that row isn't loaded. New versions will upgrade smaller version, while refusing to load bigger version.

Also in case you missed it: Each row has a version column. During startup if the version number is different from the supported one then that row isn't loaded. New versions will upgrade smaller version, while refusing to load bigger version.

Now, I realize that his approach has a drawback. If newer versions require another table name, or different number of columns, older versions might not be able to extract the version number.

Should we instead create a 2nd table named "version" with a single col+row that will hold the version of the whole db?

I think you misunderstood it here.

Is it? Here's your code:

// If db exists don't try to upgrade if (QFile::exists(dbPath)) { initiliazeDB(dbPath); return true; } else { if (ask && !userAcceptsUpgrade()) return false; }

You ask the user for upgrade if the database doesn't exist. But it is incorrect if the database does not exist due to the fact that the user has never used qBittorrent (on this system).

I've suggested correct logic in another comment.

Now, I realize that his approach has a drawback.

Yes, it is.

Should we instead create a 2nd table named "version" with a single col+row that will hold the version of the whole db?

I think "overall application versioning" I suggested in other comment should fit us. There is no need to have separate versions for each thing.

glassez · 2019-01-06T13:22:55Z

src/app/upgrade.cpp

+    initiliazeDB(dbPath);
+
+    QSqlDatabase db = QSqlDatabase::database();
+    db.transaction();


You have to handle errors. E.g:

if (!db.transaction()) throw RuntimeError(db.lastError().text()); // or return error status try { // ... bool ok = query.prepare("..."); if (!ok) throw RuntimeError(query.lastError().text()); // and so on... if (!db.commit()) throw RuntimeError(db.lastError().text()); } catch (const RuntimeError &err) { db.rollback(); thow err; // or return error status }

Sure. Truth is I was in a hurry to make everything work and forgot to come back to this later.

sledgehammer999 · 2019-01-06T20:39:12Z

Pushed new version of commits with these changes:

db.h moved under base/bittorrent
Namespace from Utils::DB to DB
Fixed issue where deleted torrents weren't removed from the DB due to missing variable escape

As for the other issues I'll comment inline.

sledgehammer999 · 2019-01-07T03:21:56Z

@glassez
From a look at the code here is one way I thought to decouple things. (well, it isn't groundbreaking).
Our custom qBt- fields are all inserted in TorrentHandle::handleSaveResumeDataAlert()
Most of these values should be written in the database as soon as the new torrent is created+added to the DB. This currently happens in 1st commit here. In that INSERT INTO statement I intend to use the proper values from the params variable (of type CreateTorrentParams) which is used to initialize the TorrentHandle and ultimately the qBt- fields.
Each qBt- field becomes a DB column.
Then inside the TorrentHandle class, each time a field is updated we also update the DB record.
For example, inside TorrentHandle::setSeedingTimeLimit() we update the DB record for m_seedingTimeLimit (qBt-seedingTimeLimit)

Finally, loadTorrentResumeData() will be changed to load stuff from the DB instead.

Do you think another way is more appropriate?

sledgehammer999 · 2019-01-08T01:36:57Z

@glassez I pushed a few new commits. I didn't merge them back into the others in hope it is easier to review the changes.

I need your opinion. The last commit is unfinished and still WIP. I didn't have more time to tidy it up.
Can you take a look at it and consider the following:
I assume you agree to save each old fastresume field as a table column, right? As you can see in the last commit, I save each setting as soon as it is changed. Here's where I want your input. Should the saving be done by the TorrentHandle class (like its done in my commit), or done by the Session class?
For example, when the torrent changes name, should the saving be done inside TorrentHandle::setName() or inside Session::handleTorrentNameChanged()?

I assume your answer will be that the Session should do that. It's more logical.
However, there are some settings which aren't publicly exposed by the TorrentHandle class. Like the m_name variable(TorrentHandle::name() doesn't return it verbatim). Is it ok if I extend the various Session::handleTorrent* methods to receive that argument directly from the caller?

In any case, as this last commit is very WIP, I have nothing solid. I merely was exploring the possibilities. So if you want to propose a better approach let me hear it (with a pseudo-example if possible).

WolfganP · 2019-01-08T11:18:43Z

As I commented in #10115, I'm optimizing my RasPi setup to avoid using the SDcard or other slow storage units.

I tried to follow up this PR changes to check where the new database file will be stored, it's need for persistence across boots and the interaction with the portable settings; but I was unable to.

@sledgehammer999 do you mind to comment on it?

glassez · 2019-01-08T12:06:34Z

I assume you agree to save each old fastresume field as a table column, right?

Right.
But I propose to store ALL fastresume data in separate columns.

Should the saving be done by the TorrentHandle class (like its done in my commit), or done by the Session class?

Although some logic is currently delegated to TorrentHandle class, I would prefer to have it centrally in the Session class and leave TorrentHandle only as a convenient interface.

Is it ok if I extend the various Session::handleTorrent* methods to receive that argument directly from the caller?

I don't mind doing it that way to begin with. There may be a better way later.

So if you want to propose a better approach let me hear it (with a pseudo-example if possible).

Ok.

Ryrynz · 2020-01-14T06:33:04Z

Would recommend pushing this to a 4.5 or even a 5.0 Milestone.

Pentaphon · 2020-06-23T21:48:20Z

Would recommend pushing this to a 4.5 or even a 5.0 Milestone.

I agree, I'm not sure why this is in the 4.2.6 milestone.

xavier2k6 · 2020-08-24T12:52:37Z

for ref (previous discussion/thoughts) that I came acrosss -> #7565 (comment) onwards..

txtsd · 2021-01-06T11:33:20Z

Why is this continuously being pushed forward? What's the holdup?

glassez · 2021-01-06T13:05:49Z

In fact, now there are even more prerequisites to use SQLite (for example, we now often save resume data when changing a particular field, but we still have to generate all the data and overwrite the file).
I would like to implement it in the near future if @sledgehammer999 does not intend to do it himself.

Ryrynz · 2021-03-20T01:08:56Z

Any chance the current open issues in 4.3.4 can be moved to their respective 4.4 or 5.0 milestones?
They really shouldn't be pushed out to every minor version milestone release after release.

glassez · 2021-03-20T04:16:15Z

I am currently working on optimizing the resume data storage subsystem. The implementation of SQLite database storage support is part of my plans. So I still close this PR for the sake of future implementation.

sledgehammer999 added the Core label Jan 2, 2019

sledgehammer999 added this to the 4.2.0 milestone Jan 2, 2019

sledgehammer999 requested review from glassez and Chocobo1 January 2, 2019 19:53

sledgehammer999 force-pushed the sqlite branch 2 times, most recently from c4e89b8 to 0d565dd Compare January 2, 2019 23:06