Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_with_whole_file_closes_file test fails #1363

Open
SchahinRohani opened this issue Sep 25, 2024 · 2 comments
Open

update_with_whole_file_closes_file test fails #1363

SchahinRohani opened this issue Sep 25, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@SchahinRohani
Copy link
Contributor

Observed in https://github.com/TraceMachina/nativelink/actions/runs/11036359062/job/30654593386?pr=1362

test update_with_whole_file_closes_file ... FAILED
test update_file_future_drops_before_rename ... ok
test update_with_whole_file_slow_path_when_low_file_descriptors ... ok
test update_with_whole_file_uses_same_inode ... ok
test valid_results_after_shutdown_test ... ok

failures:

---- update_with_whole_file_closes_file stdout ----
thread 'update_with_whole_file_closes_file' panicked at nativelink-store/tests/filesystem_store_test.rs:1325:9:
assertion failed: `(left == right)`: Expected 1 permit to be available

Diff < left / right > :
<0
>1


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
error: test failed, to rerun pass `-p nativelink-store --test filesystem_store_test`
    update_with_whole_file_closes_file

test result: FAILED. 18 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.17s
@SchahinRohani SchahinRohani added the bug Something isn't working label Sep 25, 2024
@aaronmondal
Copy link
Member

Observed again in https://github.com/TraceMachina/nativelink/actions/runs/11603648351/job/32311020291

running 18 tests
test eviction_drops_file_test ... ok
test atime_updates_on_get_part_test ... ok
  2024-10-31T00:25:23.647982Z ERROR nativelink_store::filesystem_store: Failed to touch file, err: Error { code: NotFound, messages: ["The system cannot find the file specified. (os error 2)", "Failed to touch file in filesystem store \"C:\\\\Users\\\\RUNNER~1\\\\AppData\\\\Local\\\\Temp\\\\/5268803302529067229/content_path/0123456789abcdef000000000000000000010000000000000123456789abcdef-10\""] }
    at nativelink-store\src\filesystem_store.rs:344

  2024-10-31T00:25:23.648176Z  WARN nativelink_store::filesystem_store: Failed to rename file, digest: DigestInfo("0123456789abcdef000000000000000000010000000000000123456789abcdef-10"), from_path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\/5268803302529067229/content_path/0123456789abcdef000000000000000000010000000000000123456789abcdef-10", to_path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\/181162993943[487](https://github.com/TraceMachina/nativelink/actions/runs/11603648333/job/32311019252#step:5:488)12419/temp_path/0123456789abcdef000000000000000000010000000000000300000000000000-10", err: Error { code: NotFound, messages: ["The system cannot find the file specified. (os error 2)"] }
    at nativelink-store\src\filesystem_store.rs:371

test deleted_file_removed_from_store ... ok
test digest_contents_replaced_continues_using_old_data ... ok
test eviction_on_insert_calls_unref_once ... ok
test file_continues_to_stream_on_content_replace_test ... ok
test file_gets_cleans_up_on_cache_eviction ... ok
test get_file_size_uses_block_size ... ok
test get_part_is_zero_digest ... ok
test get_part_timeout_test ... ok
test has_with_results_on_zero_digests ... ok
test oldest_entry_evicted_with_access_times_loaded_from_disk ... ok
  2024-10-31T00:25:23.822376Z ERROR nativelink_store::filesystem_store: Failed to rename file, err: Error { code: NotFound, messages: ["The system cannot find the path specified. (os error 3)", "Failed to rename temp file to final path \"C:\\\\Users\\\\RUNNER~1\\\\AppData\\\\Local\\\\Temp\\\\/5037703301562012528/content_path/0123456789abcdef000000000000000000010000000000000123456789abcdef-10\""] }, from_path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\/3562896487859924719/temp_path/0123456789abcdef000000000000000000010000000000001600000000000000-10", final_path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\/5037703301562012528/content_path/0123456789abcdef000000000000000000010000000000000123456789abcdef-10"
    at nativelink-store\src\filesystem_store.rs:701
    in nativelink_store::filesystem_store::filesystem_store_emplace_file

test rename_on_insert_fails_due_to_filesystem_error_proper_cleanup_happens ... ok
test temp_files_get_deleted_on_replace_test ... ok
  2024-10-31T00:25:23.930110Z ERROR nativelink_store::filesystem_store: Failed to delete file, file_path: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\/7920331084642632296/temp_path/0123456789abcdef000000000000000000010000000000001c00000000000000-10", err: Error { code: Internal, messages: ["background task failed: JoinError::Cancelled(Id(1342))", "Failed to remove file \"C:\\\\Users\\\\RUNNER~1\\\\AppData\\\\Local\\\\Temp\\\\/7920331084642632296/temp_path/0123456789abcdef000000000000000000010000000000001c00000000000000-10\""] }
    at nativelink-store\src\filesystem_store.rs:124
    in nativelink_store::filesystem_store::filesystem_delete_file

test update_file_future_drops_before_rename ... ok
test update_with_whole_file_closes_file ... FAILED
test update_with_whole_file_slow_path_when_low_file_descriptors ... ok
test valid_results_after_shutdown_test ... ok

failures:

---- update_with_whole_file_closes_file stdout ----
thread 'update_with_whole_file_closes_file' panicked at nativelink-store\tests\filesystem_store_test.rs:1322:9:
assertion failed: `(left == right)`: Expected 1 permit to be available

Diff < left / right > :
<0
>1


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    update_with_whole_file_closes_file

@leodziki
Copy link

leodziki commented Nov 17, 2024

Hello, @aaronmondal , @SchahinRohani
https://github.com/TraceMachina/nativelink/actions/runs/11873234076/job/33087849658?pr=1479
Intermittent failures in the tests are caused by a race condition in semaphore management, where a file grant (fs::OPEN_FILE_SEMAPHORE) is not released in time, likely due to asynchronous cleanup delays or resource contention. The tests expect a certain number of grants, but occasionally there is a mismatch, because a deletion or cleanup operation (e.g., releasing a file handle) may not complete before the assertion. I also think that shared resources such as file paths or semaphores between tests may contribute to this issue, especially if the tests are run concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants