Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Iterative Mode Hangs #198

Open
koh-joshua opened this issue Oct 30, 2023 · 3 comments
Open

[BUG] Iterative Mode Hangs #198

koh-joshua opened this issue Oct 30, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@koh-joshua
Copy link

koh-joshua commented Oct 30, 2023

The Issue: Iterative Mode Hangs
RuntimeWarning: Every gene contains at least one zero, cannot compute log geometric means. Switching to iterative mode.
But iterative mode just gets stucked, without completing:
Fitting dispersions...
done in 1.39 seconds.
Fitting MAP dispersions...
done in 1.40 seconds.
- gets stucked here -

To Reproduce
Python 3.10, clean install with pip install pydeseq2. Using PyCharm IDE.

Create DDS

dds = DeseqDataSet(counts=x_train_count,
metadata=metadata,
design_factors='status',
refit_cooks=True)

Run deseq2

dds.deseq2()

RuntimeWarning shows up switching to iterative mode but gets stucked after fitting MAP dispersions.

Expected behavior
Iterative mode completes without getting stucked.

Screenshots
image

Desktop (please complete the following information):

  • OS: Windows 11 Pro.

Additional context
Tried executing in Jupyter notebook but same issue comes up. I suspect there's a broken piece of code somewhere.
I replaced all nan with 1: x_train_count.fillna(1) and was able to run dds.deseq2(), with most genes returning nan after DeseqStats(dds) but still useable. However, I am not sure about the effect of replacing zero/nan with 1s in analysis. Preferably, either iterative mode or standard mode can run with nan or zeros.
Update: tried running it in VS Code under terminal and interactive window but same issue, it gets stucked.

@koh-joshua koh-joshua added the bug Something isn't working label Oct 30, 2023
@tboen1
Copy link

tboen1 commented Mar 19, 2024

Hi, have there been any updates on this? I'm also running into the same error, and would like to know if replacing 0s with 1s is an appropriate solution, or if there are any new recomendations.

@wdg118
Copy link

wdg118 commented Jun 12, 2024

@tboen1 Have you managed to fix this ? I have the same issue:

Fitting size factors...
/home/jupyter/.local/lib/python3.11/site-packages/pydeseq2/dds.py:441: UserWarning: Every gene contains at least one zero, cannot compute log geometric means. Switching to iterative mode.
  self.fit_size_factors()
Fitting dispersions...
/home/jupyter/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(
... done in 186.84 seconds.

Fitting MAP dispersions...
... done in 193.46 seconds.

@StevenSong
Copy link

StevenSong commented Jun 28, 2024

Also running into this though it's not really hanging per se, it's clearly still chugging away as cpu and ram usage fluctuates on the process, it's more that it's just taking an intractable amount of time to complete even one iteration. Not sure how exactly the algorithm scales relative to input size but my hunch is my input is too large at 27,403 x 16,372 for my counts df. I don't really have a fix code-wise as I'm not familiar enough with the various parts of the DESeq2 algorithm at the moment, maybe the maintainers will have more insight into the time consuming steps of the iterative method (@BorisMuzellec)?

But to the suggestions of getting rid of the 0s, I've thought of a few methods to test:

  • replacing 0s with 1s
  • adding 1 to all counts
  • finding the gene with the fewest 0s and replacing only that gene's 0s with 1s
  • excluding the samples with 0 count for the gene with the fewest 0s

Of course all of these will change the result values, however they do differ in the magnitude of change when testing on a small toy dataset. Still which one is least incorrect may be dependent on the experiment/use case. In my context, I think excluding samples is probably the most valid, as I'm lucky to not need to exclude too many samples. In my data, the gene with the fewest 0s has 15 0s. Having ~27000 samples where my groups are also >> 15 samples, excluding these 15 doesn't seem too bad to me for computation tractability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants