You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello I am attempting to run megalodon for 5hmC calling on 100s of cancer nanopore samples and so far have gotten a couple runs to work. However, to run megalodon on the whole genome these runs have taken a week to run. I am wondering what parameters would be ideal for optimizing megalodon efficiency. Here are the current run parameters I am using:
Currently, I have tried using 1-10 fast5 files per run separating each set of fast5 files to their own folder for the whole genome. I am doing these runs on a local cluster of gpus with fairly limited resources. Please, let me know if you think there are any flaws with this approach and what I could alter to optimize efficiency. I know their are the fast remora models but ideally I don't want to compromise any accuracy. Let me know what your thoughts are thanks!
The text was updated successfully, but these errors were encountered:
I would recommend using Guppy or Dorado for modified base calling going forward. Megalodon is not being supported going forward and you are likely to get much better performance from the production basecallers where the Remora models have been integrated and optimized. If there is something missing from the outputs of the production basecallers in terms of modified base support please raise those issues there.
Hello I am attempting to run megalodon for 5hmC calling on 100s of cancer nanopore samples and so far have gotten a couple runs to work. However, to run megalodon on the whole genome these runs have taken a week to run. I am wondering what parameters would be ideal for optimizing megalodon efficiency. Here are the current run parameters I am using:
#SBATCH --mem-per-cpu=64gb
#SBATCH --gres=gpu:2
megalodon <path_to_fast5s_folder> --guppy-server-path <path_to_guppy_6.38_server> --guppy-config dna_r9.4.1_450bps_sup_prom.cfg --reference <path_to_reference> --remora-modified-bases dna_r9.4.1_e8 sup 0.0.0 5hmc_5mc CG 0 --device 0 1 --outputs per_read_mods basecalls --chunk-size 500 --max-concurrent-chunks 100
Currently, I have tried using 1-10 fast5 files per run separating each set of fast5 files to their own folder for the whole genome. I am doing these runs on a local cluster of gpus with fairly limited resources. Please, let me know if you think there are any flaws with this approach and what I could alter to optimize efficiency. I know their are the fast remora models but ideally I don't want to compromise any accuracy. Let me know what your thoughts are thanks!
The text was updated successfully, but these errors were encountered: