Skip to content

Use SQLite to store torrents and fastresumes #10099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

sledgehammer999
Copy link
Member

I recently played with SQLite and I was fascinated by it. Then I realized how easy things could become for us if we used to save stuff.

Pros:

  1. Far easier code for sorting torrents by queue during startup
  2. Possibly much faster startup for users with many torrents because now we don't have to read multitudes of small files from disk
  3. SQLite should have a very robust anti-corruption systems. Far better than anything we can engineer ourselves

I plugged in the new functionality by keeping the current design as much as possible. I didn't attempt to engineer a better code architecture around the new system.
Most of the code you see is old code moved around.

PS: Is anyone more knowledgeable with SQL? I wonder if we should split the table into 2 for performance reasons. One saving hash+metadata and one saving hash+fastresume+queue.

@sledgehammer999 sledgehammer999 added this to the 4.2.0 milestone Jan 2, 2019
@sledgehammer999
Copy link
Member Author

Special shoutout to DB Browser for SQLite. It is a great lightweight that lets you inspect/modify/administer and SQLite db. Very useful for debugging.

@sledgehammer999 sledgehammer999 force-pushed the sqlite branch 2 times, most recently from c4e89b8 to 0d565dd Compare January 2, 2019 23:06
@glassez
Copy link
Member

glassez commented Jan 3, 2019

@sledgehammer999, that's fine!
I'm glad my old idea finally sprung. This will undoubtedly be a step forward. Although there is still work to be done, starting from what and how we should store in the tables, and ending with the details of the implementation. I'll start commenting on specific things later.


#pragma once

namespace Utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no database related utilities in this file. It's just database configuration for some particular application component (BitTorrent). So it should be under "base/bittorrent".

@glassez
Copy link
Member

glassez commented Jan 3, 2019

I plugged in the new functionality by keeping the current design as much as possible.

Too much.
Since we stored some additional data along with the "fastresume" data, we had to request the creation of "fastresume" data each time we had anything changed (e.g. save path, torrent name, etc.), then we injecte these additional fields in "fastresume" data, encode it and write to file.
We don't have to do it that way anymore. We can store them in separate columns and update them independently.
Even more. If we look at the libtorrent "fastresume" data, we can see that most of its fields has the behavior described above. E.g. when torrent is paused or resumed we receive an appropriate alert so we can just update "paused" and "auto_managed" columns and so on.
By and large, we need to request "fastresume" data generation just to store the current progress.
This may seem somewhat inconvenient due to libtorrent drawback (it requires "fastresume" data to be bencoded when we pass it to add_torrent_params). But this is slightly improved in the libtorrent-1.2 (now we just have to fill in the appropriate add_torrent_params fields).

@sledgehammer999
Copy link
Member Author

I am all for decoupling our data from libtorrent's fastresume.
But this will need investigation on when to save our data and what triggers it. Aka reacting more closely to the triggering event.
This is work for the weekend though.

@glassez
Copy link
Member

glassez commented Jan 4, 2019

But this will need investigation on when to save our data and what triggers it.

At least all current "fastresume" saving events that are not fired by timer.

@@ -535,7 +538,8 @@ Session::Session(QObject *parent)
connect(&m_networkManager, &QNetworkConfigurationManager::configurationChanged, this, &Session::networkConfigurationChange);

m_ioThread = new QThread(this);
m_resumeDataSavingManager = new ResumeDataSavingManager {m_resumeFolderPath};
const QDir resumeDataDir(m_resumeFolderPath);
m_resumeDataSavingManager = new ResumeDataSavingManager{resumeDataDir.absoluteFilePath(QLatin1String {Utils::DB::FILENAME})};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to have database in BT_backup folder? Why? Not only it has the inappropriate name (we kept it for compatibility purposes only), a separate folder is not needed at all now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reason I did it is because we have the lock file there which kinda guarantees that we have permission to read/write in that folder. Which in turn will make exporting the torrents back to the all system more manageable.
I open to suggestions for a new location. Also for a better name for the DB/table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned earlier (in other words), you should not interfere any extra logic with the basic one anywhere. The main way is to work with the new "saving system". That is, if we simply discard all additional import/export logic, then the main one should not be affected. The same applies to all user visible manifestations of the application.
Your current code assumes that user had previous qBittorrent version before. It's incorrect! You should check for it first and then select what you should do.

I open to suggestions for a new location.

The same folder where legacy BT_backup were placed.

Also for a better name for the DB/table.

Just something more specific than "data." E.g. "torrents".

A reason I did it is because we have the lock file there which kinda guarantees that we have permission to read/write in that folder.

"Lock file" logic should be dropped at all. It's meaningless in case of new "saving system". In addition, as it turned out, it is inefficient on *nix systems.


// Table doesn't exist. Probably 1st run.
// Input PRAGMAs and create table
query.exec(QLatin1String("PRAGMA auto_vacuum = FULL"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you really want it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It shrinks the file small when removing torrents. Why should we use it?

Copy link
Member

@glassez glassez Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shrinks the file small when removing torrents.

Is that really necessary? The empty pages can be reused further. Additionally the doc says:

However, using auto_vacuum can lead to extra database file fragmentation. And auto_vacuum does not compact partially filled pages of the database

Maybe it's better to use VACUUM command from time to time or allow user to perform it?

libt::bencode(std::back_inserter(preparedResumeData), resumeData);
}

// This should be moved back into Session::initResumeFolder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this upgrade applies entirely to BitTorrent component, I would not extract this code deliberately into a separate file. This breaks the logic of the code structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do this check before trying to do any kind of upgrade. The upgrade code assumes that the db exists and is accessible and that the table exists in it.
The upgrade code is called before the Session is constructed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised at this answer. I even thought for a moment that I was debating with another person...

The upgrade code assumes that the db exists and is accessible and that the table exists in it.

This does not contradict the fact that this code should be in a different place.

The upgrade code is called before the Session is constructed.

It's incorrect. At least it's done incorrectly. You ripped main Session code as well.

Main Session logic should be the following:

  1. If database doesn't exist create it
  2. Load torrents from database and go ahead.

When you add upgrade logic to it:

  1. If database doesn't exist create it
  2. If older version was used before this run try to import torrents from old "saving system"
  3. Load torrents from database and go ahead.

All this logic should be done in Session code!

One thing that should be done before the Session is created is asking the user for accept the upgrade. To do this, we do not need to know anything about the existence of the database. All you need to know is whether the old qBittorrent version was used just before the current application run.

My main idea is "not to break the basic logic of the application for the sake of some additional".


// Check if our table exists
QSqlQuery query(db);
if (query.exec(QString("SELECT name FROM sqlite_master WHERE type='table' AND name='%1'").arg(TABLE_NAME))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done easier:

if (db.tables().contains(TABLE_NAME))
    return;

const QString backupFolderPath = Utils::Fs::expandPathAbs(specialFolderLocation(SpecialFolder::Data) + "BT_backup");
const QDir backupFolderDir(backupFolderPath);
const QString dbPath = backupFolderDir.absoluteFilePath(QLatin1String{FILENAME});
// If db exists don't try to upgrade
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid logic!
A user who has never used older qBittorrent versions should not care about the old saving system.

We need to implement more advanced version control. For example, store the current version somewhere in the application data. Then newer versions will be able to know more precisely what needs to be updated, and older versions will be able to warn the user about possible conflicts as a result of downgrade. In addition, it will avoid conflicts as a result of multiple upgrades/downgrades, when it is difficult to estimate based on some artifacts and we will know exactly which version was installed before (e.g. when both fastresumes and database exist).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood it here.
At the current moment this tries to updrade from the non-db way to the db-way. So if a db exists we don't try to migrate older files (.torrents+.fastresumes). It is assumed that the upgrade was done already and the user dropped new files at a later date.
Unless you suggest that we should always input into the db any kind of .torrent/.fastresume we find during startup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in case you missed it: Each row has a version column. During startup if the version number is different from the supported one then that row isn't loaded. New versions will upgrade smaller version, while refusing to load bigger version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in case you missed it: Each row has a version column. During startup if the version number is different from the supported one then that row isn't loaded. New versions will upgrade smaller version, while refusing to load bigger version.

Now, I realize that his approach has a drawback. If newer versions require another table name, or different number of columns, older versions might not be able to extract the version number.

Should we instead create a 2nd table named "version" with a single col+row that will hold the version of the whole db?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood it here.

Is it? Here's your code:

                    // If db exists don't try to upgrade
		    if (QFile::exists(dbPath)) {
		        initiliazeDB(dbPath);
		        return true;
		    }
		    else {
		        if (ask && !userAcceptsUpgrade()) return false;
		    }

You ask the user for upgrade if the database doesn't exist. But it is incorrect if the database does not exist due to the fact that the user has never used qBittorrent (on this system).

I've suggested correct logic in another comment.

Now, I realize that his approach has a drawback.

Yes, it is.

Should we instead create a 2nd table named "version" with a single col+row that will hold the version of the whole db?

I think "overall application versioning" I suggested in other comment should fit us. There is no need to have separate versions for each thing.

initiliazeDB(dbPath);

QSqlDatabase db = QSqlDatabase::database();
db.transaction();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to handle errors. E.g:

if (!db.transaction())
    throw RuntimeError(db.lastError().text());  // or return error status
try {
    // ...
    bool ok = query.prepare("...");
    if (!ok)
        throw RuntimeError(query.lastError().text());
    // and so on...
    if (!db.commit())
        throw RuntimeError(db.lastError().text()); 
}
catch (const RuntimeError &err) {
    db.rollback();
    thow err; // or return error status
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Truth is I was in a hurry to make everything work and forgot to come back to this later.

@sledgehammer999
Copy link
Member Author

Pushed new version of commits with these changes:

  1. db.h moved under base/bittorrent
  2. Namespace from Utils::DB to DB
  3. Fixed issue where deleted torrents weren't removed from the DB due to missing variable escape

As for the other issues I'll comment inline.

@sledgehammer999
Copy link
Member Author

sledgehammer999 commented Jan 7, 2019

@glassez
From a look at the code here is one way I thought to decouple things. (well, it isn't groundbreaking).
Our custom qBt- fields are all inserted in TorrentHandle::handleSaveResumeDataAlert()
Most of these values should be written in the database as soon as the new torrent is created+added to the DB. This currently happens in 1st commit here. In that INSERT INTO statement I intend to use the proper values from the params variable (of type CreateTorrentParams) which is used to initialize the TorrentHandle and ultimately the qBt- fields.
Each qBt- field becomes a DB column.
Then inside the TorrentHandle class, each time a field is updated we also update the DB record.
For example, inside TorrentHandle::setSeedingTimeLimit() we update the DB record for m_seedingTimeLimit (qBt-seedingTimeLimit)

Finally, loadTorrentResumeData() will be changed to load stuff from the DB instead.

Do you think another way is more appropriate?

@sledgehammer999
Copy link
Member Author

@glassez I pushed a few new commits. I didn't merge them back into the others in hope it is easier to review the changes.

I need your opinion. The last commit is unfinished and still WIP. I didn't have more time to tidy it up.
Can you take a look at it and consider the following:
I assume you agree to save each old fastresume field as a table column, right? As you can see in the last commit, I save each setting as soon as it is changed. Here's where I want your input. Should the saving be done by the TorrentHandle class (like its done in my commit), or done by the Session class?
For example, when the torrent changes name, should the saving be done inside TorrentHandle::setName() or inside Session::handleTorrentNameChanged()?

I assume your answer will be that the Session should do that. It's more logical.
However, there are some settings which aren't publicly exposed by the TorrentHandle class. Like the m_name variable(TorrentHandle::name() doesn't return it verbatim). Is it ok if I extend the various Session::handleTorrent* methods to receive that argument directly from the caller?

In any case, as this last commit is very WIP, I have nothing solid. I merely was exploring the possibilities. So if you want to propose a better approach let me hear it (with a pseudo-example if possible).

@WolfganP
Copy link

WolfganP commented Jan 8, 2019

As I commented in #10115, I'm optimizing my RasPi setup to avoid using the SDcard or other slow storage units.

I tried to follow up this PR changes to check where the new database file will be stored, it's need for persistence across boots and the interaction with the portable settings; but I was unable to.

@sledgehammer999 do you mind to comment on it?

@glassez
Copy link
Member

glassez commented Jan 8, 2019

I assume you agree to save each old fastresume field as a table column, right?

Right.
But I propose to store ALL fastresume data in separate columns.

Should the saving be done by the TorrentHandle class (like its done in my commit), or done by the Session class?

Although some logic is currently delegated to TorrentHandle class, I would prefer to have it centrally in the Session class and leave TorrentHandle only as a convenient interface.

Is it ok if I extend the various Session::handleTorrent* methods to receive that argument directly from the caller?

I don't mind doing it that way to begin with. There may be a better way later.

So if you want to propose a better approach let me hear it (with a pseudo-example if possible).

Ok.

@sledgehammer999 sledgehammer999 added this to the 4.2.2 milestone Dec 19, 2019
@Ryrynz
Copy link

Ryrynz commented Jan 14, 2020

Would recommend pushing this to a 4.5 or even a 5.0 Milestone.

@Pentaphon
Copy link

Would recommend pushing this to a 4.5 or even a 5.0 Milestone.

I agree, I'm not sure why this is in the 4.2.6 milestone.

@xavier2k6
Copy link
Member

for ref (previous discussion/thoughts) that I came acrosss -> #7565 (comment) onwards..

@txtsd
Copy link

txtsd commented Jan 6, 2021

Why is this continuously being pushed forward? What's the holdup?

@glassez
Copy link
Member

glassez commented Jan 6, 2021

In fact, now there are even more prerequisites to use SQLite (for example, we now often save resume data when changing a particular field, but we still have to generate all the data and overwrite the file).
I would like to implement it in the near future if @sledgehammer999 does not intend to do it himself.

@sledgehammer999 sledgehammer999 modified the milestones: 4.3.3, 4.3.4 Jan 19, 2021
@Ryrynz
Copy link

Ryrynz commented Mar 20, 2021

Any chance the current open issues in 4.3.4 can be moved to their respective 4.4 or 5.0 milestones?
They really shouldn't be pushed out to every minor version milestone release after release.

@glassez
Copy link
Member

glassez commented Mar 20, 2021

I am currently working on optimizing the resume data storage subsystem. The implementation of SQLite database storage support is part of my plans. So I still close this PR for the sake of future implementation.

@glassez glassez closed this Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.