Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatability with VACmap #706

Open
jamesc99 opened this issue Aug 14, 2024 · 0 comments
Open

Compatability with VACmap #706

jamesc99 opened this issue Aug 14, 2024 · 0 comments

Comments

@jamesc99
Copy link

Operating system

  • Operating System: CentOS Stream 8
  • CPE OS Name: cpe:/o:centos:centos:8
  • Kernel: Linux 4.18.0-553.6.1.el8.x86_64
  • Architecture: x86-64

Package name
version of pbsv: 2.9.0 (commit v2.9.0-2-gce1559a). I installed and run pbsv via conda env.

Which package / tool is causing the problem? Which version are you using, use tool --version. Have you updated to the latest version conda update package? Have you updated the complete env by running conda update --all? Have you ensured that your channel priorities are set up according to the bioconda recommendations at https://bioconda.github.io/#set-up-channels?

  • Yes I have tried all of above.

Conda environment

conda list
# packages in environment at /stornext/snfs4/next-gen/scratch/ryan/tools/miniconda3/envs/pbsv:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
certifi                   2020.6.20          pyhd3eb1b0_3  
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgomp                   14.1.0               h77fa898_0    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libstdcxx-ng              14.1.0               hc0a3c3a_0    conda-forge
libzlib                   1.2.13               h4ab18f5_6    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
openssl                   3.3.1                h4bc722e_2    conda-forge
pbsv                      2.9.0                h9ee0642_0    bioconda
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
python                    2.7.18               h42bf7aa_3  
python_abi                2.7                    1_cp27mu    conda-forge
readline                  8.2                  h5eee18b_0  
setuptools                44.0.0                   py27_0  
sqlite                    3.46.0               h6d4b2fc_0    conda-forge
tk                        8.6.14               h39e8969_0  
wheel                     0.37.1             pyhd3eb1b0_0  
zlib                      1.2.13               h4ab18f5_6    conda-forge

Describe the bug
I was trying to run pbsv on BAM files aligned by VACmap (https://github.com/micahvista/VACmap). I added the option when run VACmap to generate the RG tag (ID and SM) required by pbsv. However, after mapping, I found it failed to generate the rg tag so I
manually added the tag by samtools addreplacerg -@8 -r "ID:${rg_id}" -r "SM:${rg_id}" -o rg_added_${bam_basename} ${bamfile}. the results of samtools view -H are attached.

screenshot_pbsv

I tried to run pbsv on bam files aligned by pbmm2, minimap2, and VACmap. Only the VACmap bams failed at pbsv discover step

Error message

  1. fail to run pbsv discover
cat archieved_after_adding_rg/pbsv_527127_4294967294.err 
/var/spool/slurm/d/job527127/slurm_script: line 34: 846115 Killed                  pbsv discover --max-skip-split 100 ${input_bam} ${output_name}.svsig.gz
>|> 20240810 05:40:33.937 -|- FATAL -|- Run -|- 0x7fcda849fbc0|| -|- pbsv call ERROR: Input file does not exist: 'hg002_pacbio_vacmap.svsig.gz'
  1. I also encountered the 'out of memory' error. Does pbsv require much memory? (I do remember it doesn't)

To Reproduce
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!

  1. Test data as attached
    test_data_hg002_pacbio_vacmap.zip

  2. Script:

#!/bin/bash
#SBATCH --job-name=pbsv
#SBATCH --output=%x_%A_%a.out
#SBATCH --error=%x_%A_%a.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=36gb
#SBATCH --time=72:00:00
#SBATCH --partition=medium
#SBATCH -A proj-fs0002

source /users/u250191/.bashrc
conda activate pbsv

# Set reference and work directory
ref38="/users/u250191/ryan_scratch_ln/reference/human-grch38.fasta"
WORK_DIR=$(pwd)

# Set input parameters
input_bam=$1
DATA_TYPE=$2
output_name=$3

# Ensure all parameters are provided
if [ -z "$input_bam" ] || [ -z "$DATA_TYPE" ] || [ -z "$output_name" ]; then
    echo "Usage: sbatch script.sh <input_bam> <DATA_TYPE> <output_name>"
    exit 1
fi

# Step 1: pbsv discover
pbsv discover --max-skip-split 100 ${input_bam} ${output_name}.svsig.gz

# Step 2: pbsv call based on data type
if [ "$DATA_TYPE" == "pacbio" ]; then
    pbsv call --num-threads 8 --ccs ${ref38} ${output_name}.svsig.gz ${output_name}.vcf
elif [ "$DATA_TYPE" == "ont" ]; then
    pbsv call --num-threads 8 ${ref38} ${output_name}.svsig.gz ${output_name}.vcf
else
    echo "Invalid data type specified. Please use 'pacbio' or 'ont'."
    exit 1
fi

Expected [behavior]
I hope pbsv can be compatible with VACmap (or resolve this issue)

Thank you!
Ryan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant