suggestions with big data #16

alexyfyf · 2023-12-29T04:04:24Z

Hi Alex,

I found your tool generating a lot of intermedia files (also from isonclust and isoncorrect). It consumes my inodes quickly.
Any suggestions how to alleviate this for big dataset?
Would increase (or decrease) --max_seqs or --max_seqs_to_spoa help?

Thank you so much.
Cheers,

The text was updated successfully, but these errors were encountered:

alexyfyf · 2023-12-30T22:52:40Z

also i noticed in your pipeline, you set inonclust --k 8 --w 9 rather than the default --k 13 --w 20 for ONT data, which also slow down a lot of clustering step. Any reason for choosing that?

aljpetri · 2024-01-09T09:14:41Z

Hi thank you very much again for reporting your findings.

also i noticed in your pipeline, you set isonclust --k 8 --w 9 rather than the default --k 13 --w 20 for ONT data, which also slow down a lot of clustering step. Any reason for choosing that?

I have fixed this in commit 2f40387 and also changed the name of the run_mode to ont instead of analysis to make clearer what the mode is used for. The parameters k and w were used in our analyses to alleviate any possible impacts of isONclust on the final results but are not recommended to be run with with ONT data sets.

Any suggestions how to alleviate this for big dataset?

If you refer to the number of clusters (isONclust and isONcorrect), one thing you could try is to set a higher value for iso_abundance when running the pipeline. This would require more reads per cluster to be formed (for isONclust and isONcorrect) as well as a higher number of reads supporting an isoform to be called and should reduce the number of clusters. This, however, might mean that some isoforms with very low read support might not be called. If this is not what you meant could you explain a bit more?
Best,
Alex

alexyfyf · 2024-02-01T00:40:21Z

Hi, sorry for the late reply. Thanks for your suggestions.
And what if I already have a lot of clusters, and when I run isONform_parallel.py, is there any parameters that can improve the speed and IO?
My issues are when I run isONform_parallel.py, too many temporary files were generated, and quickly used up my inode. I would like some suggestions to (1) reduce the tmp files generated, (2) increased speed for isONform_parrallel.py.

Cheers,
Alex

ksahlin mentioned this issue Jan 2, 2024

Memory Problems with ONT Data ksahlin/isONclust#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suggestions with big data #16

suggestions with big data #16

alexyfyf commented Dec 29, 2023

alexyfyf commented Dec 30, 2023

aljpetri commented Jan 9, 2024 •

edited

Loading

alexyfyf commented Feb 1, 2024

suggestions with big data #16

suggestions with big data #16

Comments

alexyfyf commented Dec 29, 2023

alexyfyf commented Dec 30, 2023

aljpetri commented Jan 9, 2024 • edited Loading

alexyfyf commented Feb 1, 2024

aljpetri commented Jan 9, 2024 •

edited

Loading