-
Notifications
You must be signed in to change notification settings - Fork 1.6k
ParityDb migration tests are flaky #7609
Comments
Yea we may need to comment those if they impede the development workflow. |
Trying to reproduce but unsuccessful so far. What command did you run? I'm doing while cargo ltest test_paritydb_migrate_2_to_3; do :; done |
OK I get it if I just do |
running while true; do cargo test || break; done in |
I'm flummoxed, I don't see how these tests can be related to #7337. The tests and the error seem quite specific to paritydb.
|
If I understand correctly this only happens when there are multuple tests running? The error message here means that the database is already open. This could happen if mutiple tests use the same temp dir. In any case, these upgrade tests are probably obsolete. I'd check with the polkadot team if it even makes sense to still maintain the upgrade code. |
I've confirmed locally the issue was introduced by #7337. Weird. |
Indeed, the test can be removed now. |
It is weird. Since the branch is still available, it would be possible to go through the entire history and check which commit caused the regression but if the tests can be removed, I'll create a PR |
Yeah I'll go bisect the branch. Partly out of morbid curiosity and partly because I'm afraid there is a deeper issue that could reappear later. |
The test failures were introduced in dc5d6a0 where we added some other tests in the same crate. The new tests could not run at the same time as each other1, so I had added some code like this: // Ensure tests run one at a time so they don't interfere with each other.
static OVERRIDES_LOCK: Mutex<()> = Mutex::new(());
let _lock = OVERRIDES_LOCK.lock()?; in the next commit this was replaced with the My guess is that this affected the order that tests are run in, in such a way that caused paritydb tests to step on each other's toes. Question: is this is expected behavior in paritydb or a bug that could manifest in production? I guess in production there is only ever one instance of paritydb? Maybe we should at least catch this error and throw a more descriptive one for the benefit of future devs? @ordian Footnotes
|
Mutiple database instances should not interfere unless they use the same path. Each test must use a unique temp dir. No |
@arkpar If I understand correctly, the paritydb tests which have now been removed were simply not written correctly, so there's no actual bug there? BTW the hacks were for tests related to #7337 (totally unrelated to paritydb), and on further inspection they do actually need to be there, using a unique temp dir is not enough. |
I don't see anything wrong with the tests. They should have worked. I've looked at the I've tried reproducing the issue with |
Someone sent me this log output form a parachain DB migration:
There is a Q in the General elements chat from Peter W. about it. Could this be related? |
@ggwpez No idea if there is a connection but IMO should be investigated. Reopening. |
I don't think so. This is rocksdb error. If this can be reproduced please file a separate issue. |
Added an explicit DB unlock in paritytech/parity-db#218 which resolves this. Normally file locks should be removed automatically once the file handle is closed. For some reason when running tests the lock may take some time to disappear on its own. |
PR #7601 has now failed twice on
chains_db::upgrade::tests::test_paritydb_migrate_2_to_3
test. The issue is also reproducible locally. If the tests are run in a loop, bothtest_paritydb_migrate_2_to_3
andtest_paritydb_migrate_1_to_2
fail but not consistently.The error that is printed:
These tests failures are also observable on master. Bisect revealed #7337 to be the PR that introduced regression. cc @mrcnski @s0me0ne-unkn0wn
The text was updated successfully, but these errors were encountered: