Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 2.0.0 significantly slower #152

Closed
psinger opened this issue May 10, 2022 · 13 comments · Fixed by #164
Closed

Version 2.0.0 significantly slower #152

psinger opened this issue May 10, 2022 · 13 comments · Fixed by #164
Labels
Milestone

Comments

@psinger
Copy link

psinger commented May 10, 2022

Took me some time to find the culprit, but after upgrading to sqlitedict 2.0.0 the writing is significantly slower.

I am writing with:

with SqliteDict("tmp.db") as tmp:
    tmp["tmp"] = 1

Is this expected due to some new functionality?

@piskvorky
Copy link
Owner

Definitely not expected. Thanks for reporting.

IIRC we merged some contributions around nested exception handling, which could be the culprit. I can't think of anything else.

@psinger are you able to pinpoint the issue please? Or at least provide a minimal reproducible example.

@psinger
Copy link
Author

psinger commented May 11, 2022

@piskvorky Yes, sure, so you can run for example:

from sqlitedict import SqliteDict
def insert():
    for j in range(100):
        with SqliteDict("tmp.db") as d:
            d["tmp"] = j
            d.commit()
            
%time insert()

Runtime for version 1.7.0:

CPU times: user 113 ms, sys: 32.3 ms, total: 145 ms
Wall time: 115 ms

Runtime for version 2.0.0:

CPU times: user 147 ms, sys: 54.4 ms, total: 202 ms
Wall time: 10.2 s

@piskvorky
Copy link
Owner

piskvorky commented May 11, 2022

Oh wow, that's a huge regression. The CPU times look similar but the wall time exploded. Thanks again for reporting.

Are you able to look into it? The surface of changes between 1.7 and 2.0 is pretty small, so hopefully that's not too hard. We might have to revert some commits and make a bugfix release.

@piskvorky piskvorky added the bug label May 11, 2022
@piskvorky piskvorky added this to the Next release milestone May 11, 2022
@psinger
Copy link
Author

psinger commented May 11, 2022

Sorry, I will not have time to look into it, I migrated back to 1.7.0 for the time being.

@mpenkov
Copy link
Collaborator

mpenkov commented May 11, 2022

OK, reproduced on MacOS Python 3.9:

sergeyich:sqlitedict misha$ cat repro.py
import tempfile
from sqlitedict import SqliteDict
def insert():
    with tempfile.NamedTemporaryFile() as tmp:
        for j in range(100):
            with SqliteDict(tmp.name) as d:
                d["tmp"] = j
                d.commit()
insert()
sergeyich:sqlitedict misha$ time python3.9 repro.py

real    0m10.033s
user    0m0.237s
sys     0m0.246s
$ git co 1.7.0
Note: switching to '1.7.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at ca6758f update author_email to work with PyPI
sergeyich:sqlitedict misha$ time python3.9 repro.py

real    0m0.377s
user    0m0.189s
sys     0m0.179s

I'll run a bisection and see what commit caused this.

@mpenkov
Copy link
Collaborator

mpenkov commented May 11, 2022

I think I've found the commit that caused the problem, now trying to work out why something so trivial is causing such a performance regression:

$ git bisect log
git bisect start
# bad: [fdaf06167eebbd22ecd6932ac54487d4f3db1713] update changelog
git bisect bad fdaf06167eebbd22ecd6932ac54487d4f3db1713
# good: [ca6758fadd55e4ee7f36b26e3a39558954e91bc6] update author_email to work with PyPI
git bisect good ca6758fadd55e4ee7f36b26e3a39558954e91bc6
# good: [d5db9e67ecfd9908c06efa9d7a4a9e3e9145550c] More cleanup for flake8
git bisect good d5db9e67ecfd9908c06efa9d7a4a9e3e9145550c
# good: [e6a81f0558b6aa0e4bf4984466c6373d3aee2b52] README.rst: update performance notes.
git bisect good e6a81f0558b6aa0e4bf4984466c6373d3aee2b52
# bad: [12f023652674cd9e061c806fd1d50e3c1958f68d] minimum supported Py version is 3.7
git bisect bad 12f023652674cd9e061c806fd1d50e3c1958f68d
# bad: [3f2df9bb557e3351bc16843612e9090d7f5ba784] update python-package.yml
git bisect bad 3f2df9bb557e3351bc16843612e9090d7f5ba784
# bad: [bb8087ceb4408c0e10e3591912a7596b9448358d] Update sqlitedict.py
git bisect bad bb8087ceb4408c0e10e3591912a7596b9448358d
# bad: [90c16aeaaa69c831c7f71370d560e880c5d630ae] Rename initialized variable to not collide with superclass'es internal one
git bisect bad 90c16aeaaa69c831c7f71370d560e880c5d630ae
# first bad commit: [90c16aeaaa69c831c7f71370d560e880c5d630ae] Rename initialized variable to not collide with superclass'es internal one

90c16ae

@psinger
Copy link
Author

psinger commented May 11, 2022

That's just variable renaming....sounds unlikely it could cause any issues.

@mpenkov
Copy link
Collaborator

mpenkov commented May 11, 2022

Nah, look at how that variable is being used. There's a sleep loop that depends on it.

@psinger
Copy link
Author

psinger commented May 11, 2022

Right, but before it was called self._initialized with the same functionality.

@mpenkov
Copy link
Collaborator

mpenkov commented May 11, 2022

It's not quite the same functionality, because the superclass defines that variable and changes its value as it sees fit.

$ grep -B 10 _initialized /usr/local/Cellar/[email protected]/3.9.7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py --max-count 1

class Thread:
    """A class that represents a thread of control.

    This class can be safely subclassed in a limited fashion. There are two ways
    to specify the activity: by passing a callable object to the constructor, or
    by overriding the run() method in a subclass.

    """

    _initialized = False

@mpenkov
Copy link
Collaborator

mpenkov commented May 11, 2022

Basically, the problem itself isn't the commit in question. The problem is the way that sqlitedict is attempting to handle a race condition.

The commit merely unmasks the problem. I suspect that because that race condition handling has been effectively disabled prior to the commit, the race condition may not be such a big deal after all.

I'll have to think about it some more, but I think I see what's going on here.

@psinger
Copy link
Author

psinger commented May 11, 2022

Is the sleep necessary? Just removing it could fix the issue. But right, I guess the sleep tries to avoid the race condition.

@mpenkov
Copy link
Collaborator

mpenkov commented Nov 20, 2022

Interestingly, cannot reproduce on Py3.9 Linux:

(testenv) misha@cabron:~/git/sqlitedict$ python --version
Python 3.9.14
(testenv) misha@cabron:~/git/sqlitedict$ time python repro.py 

real    0m0.107s
user    0m0.090s
sys     0m0.016s

mpenkov added a commit that referenced this issue Nov 20, 2022
Use synchronization primitives instead of sleeping.
This eliminates the need for the timeout parameter.
The affected class is effectively internal, so this "should" not affect
regular users.

Also fixed the tests: they didn't clean up after themselves.

Added benchmarks.

Fixes #152
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants