Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:dask.multiprocessing.PermissionError when running vak prep #785

Open
milaXT opened this issue Jan 13, 2025 · 10 comments
Open

BUG:dask.multiprocessing.PermissionError when running vak prep #785

milaXT opened this issue Jan 13, 2025 · 10 comments
Labels
BUG Something isn't working

Comments

@milaXT
Copy link

milaXT commented Jan 13, 2025

When I run vak prep "D:\Program\2024-auto annotation\gy6or6_train.toml" I met a weird error like:

(vak-env) PS C:\Users\PS> vak prep "D:\Program\2024-auto annotation\gy6or6_train.toml"
2025-01-13 18:55:58,412 - vak.prep.frame_classification.frame_classification - INFO - vak version: 1.0.3
2025-01-13 18:55:58,413 - vak.prep.frame_classification.frame_classification - INFO - Will prepare dataset as directory: D:\Program\2024-auto annotation\prep\train\032312-vak-frame-classification-dataset-generated-250113_185558
2025-01-13 18:55:58,853 - vak.prep.spectrogram_dataset.prep - INFO - making array files containing spectrograms from audio files in: D:\Program\2024-auto annotation\gy6or6\032312
2025-01-13 18:55:58,860 - vak.prep.spectrogram_dataset.audio_helper - INFO - creating array files with spectrograms
2025-01-13 18:55:58,860 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_0918.584.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,861 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'x'}, in gy6or6_baseline_230312_1028.1136.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,864 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_1038.1203.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,864 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_1054.1329.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,864 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_1315.2194.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,865 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'y'}, in gy6or6_baseline_230312_1441.2765.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,865 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_1444.2779.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,866 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'0'}, in gy6or6_baseline_230312_1446.2791.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,866 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'y'}, in gy6or6_baseline_230312_1819.468.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,870 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'z'}, in gy6or6_baseline_230312_1820.472.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,870 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'x'}, in gy6or6_baseline_230312_1843.510.wav, that are not in labels_mapping. Skipping file.
2025-01-13 18:55:58,872 - vak.prep.spectrogram_dataset.audio_helper - INFO - Found labels, {'x'}, in gy6or6_baseline_230312_1948.666.wav, that are not in labels_mapping. Skipping file.
[########################################] | 100% Completed | 9.80 ss
2025-01-13 18:56:09,869 - vak.prep.spectrogram_dataset.prep - INFO - creating dataset from spectrogram files in: D:\Program\2024-auto annotation\prep\train\032312-vak-frame-classification-dataset-generated-250113_185558
2025-01-13 18:56:09,877 - vak.common.files.spect - INFO - validating set of spectrogram files
[########################################] | 100% Completed | 8.37 ss
2025-01-13 18:56:19,569 - vak.prep.spectrogram_dataset.spect_helper - INFO - creating pandas.DataFrame representing dataset from spectrogram files
[########################################] | 100% Completed | 8.27 ss
2025-01-13 18:56:29,018 - vak.prep.frame_classification.assign_samples_to_splits - INFO - Will split dataset.
2025-01-13 18:56:29,283 - vak.prep.split.split - INFO - Total target duration of splits: 95.0 seconds. Will be drawn from dataset with total duration: 1610.648.
2025-01-13 18:56:29,565 - vak.prep.frame_classification.frame_classification - INFO - Number of classes in labelmap: 12
2025-01-13 18:56:29,566 - vak.prep.frame_classification.make_splits - INFO - Making split for dataset: test
[                                        ] | 0% Completed | 5.84 s ms
Traceback (most recent call last):
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\PS\miniconda3\envs\vak-env\Scripts\vak.exe\__main__.py", line 7, in <module>
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\__main__.py", line 49, in main
    cli.cli(command=args.command, config_file=args.configfile)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\cli\cli.py", line 54, in cli
    COMMAND_FUNCTION_MAP[command](toml_path=config_file)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\cli\cli.py", line 28, in prep
    prep(toml_path=toml_path)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\cli\prep.py", line 134, in prep
    _, dataset_path = prep_module.prep(
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\prep\prep_.py", line 194, in prep
    dataset_df, dataset_path = prep_frame_classification_dataset(
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\prep\frame_classification\frame_classification.py", line 326, in prep_frame_classification_dataset
    dataset_df: pd.DataFrame = make_splits(
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\prep\frame_classification\make_splits.py", line 413, in make_splits
    samples = list(
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\bag\core.py", line 1493, in __iter__
    return iter(self.compute())
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\multiprocessing.py", line 112, in reraise
    raise exc
dask.multiprocessing.PermissionError: [Errno 13] Another program is using this file, and the process cannot access it: 'D:\\Program\\2024-auto annotation\\prep\\train\\032312-vak-frame-classification-dataset-generated-250113_185558\\gy6or6_baseline_230312_1308.2152.wav.spect.npz'

Traceback
---------
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\local.py", line 229, in execute_task
    result = task(data)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\_task_spec.py", line 745, in __call__
    return self.func(*new_argspec)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\_task_spec.py", line 171, in _execute_subgraph
    res = execute_graph(final, keys=[outkey])
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\_task_spec.py", line 984, in execute_graph
    cache[key] = node(cache)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\_task_spec.py", line 745, in __call__
    return self.func(*new_argspec)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\bag\core.py", line 1880, in reify
    seq = list(seq)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\dask\bag\core.py", line 2068, in __next__
    return self.f(*vals)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\site-packages\vak\prep\frame_classification\make_splits.py", line 342, in _save_dataset_arrays_and_return_index_arrays
    frames_path = shutil.move(source_path, split_subdir)
  File "C:\Users\PS\miniconda3\envs\vak-env\lib\shutil.py", line 826, in move
    os.unlink(src)

When I ran the command again, the problematic file changed randomly, but the error type remained the same. I have checked repeatedly and I am sure I did not open these files, so I personally did not occupy these files. Moreover, the file permissions are all set to allow writing and modification. Therefore, I suspect that there might be an issue with the parallel program, but I am not sure how to modify it.

Thanks for your help.

Operating System: Windows
Version:vak 1.0.3
python version:3.10

@milaXT milaXT added the BUG Something isn't working label Jan 13, 2025
@NickleDave
Copy link
Collaborator

Thank you @milaXT for the detailed error report, let me dig into this to see if I can figure out what's going on

Can you please reply with the entire contents of your environment by doing conda env export > environment.yml and either attaching that file (you'll have to put it in a zip, GitHub won't let you attach .yml) or by copying and pasting the text into triple backticks as you did above? Appreciate it

You were working through the tutorial, looks like?
My hunch is this is some change in Windows + dask that breaks how we prep datasets on that platform, I don't find previous related dask issues

@milaXT
Copy link
Author

milaXT commented Jan 14, 2025

Thank you Dave.Attached is the information for my conda environment.
During my working,I try following both installation tutorial 1:

conda create --name vak-env python=3.10 vak -c pytorch -c conda-forge
conda install vak -c pytorch -c conda-forge

and Troubleshooting conda installations on Windows because I got a permission error

conda create -n vak-env python==3.10
conda activate vak-env
pip install torch===2.5.0 torchvision===0.20.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install vak
 pip install tweetynet

but both returned me with the same permission error.
My operating system detailed is windows-10 x64

environment.yml.txt

@NickleDave
Copy link
Collaborator

Thank you @milaXT I will see if I can reproduce today or tomorrow

@NickleDave
Copy link
Collaborator

@milaXT can you try one other thing for me?

Are you able to move the data for the tutorial to your C drive and test it there?

If D is a network drive and there's some lag between calls, that might explain what's going on

@milaXT
Copy link
Author

milaXT commented Jan 15, 2025

Thank you Dave.I move my data to C drive but report the same error.
command as: vak prep "C:\Users\PS\ProgramInC\2024-auto annotation\gy6or6_train.toml"
error as:
dask.multiprocessing.PermissionError: [Errno 13] Another program is using this file, and the process cannot access it: 'C:\\Users\\PS\\ProgramInC\\2024-auto annotation\\prep\\train\\032312-vak-frame-classification-dataset-generated-250115_185905\\gy6or6_baseline_230312_1559.161.wav.spect.npz'

I used the "net use" command in the Windows CMD shell and confirmed that my D drive is not a network drive.

@NickleDave
Copy link
Collaborator

Thanks so much @milaXT for testing. At least we can rule that out.

I will try to reproduce today but a couple things came up, will get to it by end of day Friday at the latest

@NickleDave
Copy link
Collaborator

Hi again @milaXT just confirming that, yes, I was able to reproduce this locally using your environment.yml

Thanks so much for catching this bug and providing a detailed error report.

Not clear to me what the source of the error is yet -- possibly a Windows-specific thing, or due to a change in Dask?
Will update as soon as I know more

@NickleDave
Copy link
Collaborator

@all-contributors please add @milaXT for bug

Copy link
Contributor

@NickleDave

I've put up a pull request to add @milaXT! 🎉

@NickleDave
Copy link
Collaborator

@milaXT if I use WSL2 on the same Windows 10 machine where I reproduced your error, I am able to run vak prep without an error

Are you able to use WSL2 on your machine?

We have had issues with dask on Windows in the past (see e.g. #293) that were fixed by switching to WSL.
And IME torch tends to run better on Unix (including WSL) although the experience on Windows might have improved in recent versions, it's been a minute since I used torch extensively on a Windows machine.

I can try to get to the root of this sooner if you really need to work on Windows (but not WSL) -- but using WSL might be easier in the short term if you can.

Just let me know what else I can do to help. Happy to jump on a quick Zoom meeting to help troubleshoot if you need it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants