Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to rename biosamples with lima #716

Open
linley-sherin opened this issue Sep 17, 2024 · 0 comments
Open

Unable to rename biosamples with lima #716

linley-sherin opened this issue Sep 17, 2024 · 0 comments

Comments

@linley-sherin
Copy link

linley-sherin commented Sep 17, 2024

Operating system
Linux, running on a cluster

Package name
lima : 2.9.0 (commit v2.9.0)

Conda environment

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
isoseq                    4.0.0                h9ee0642_0    bioconda
libgcc                    14.1.0               h77fa898_1    conda-forge
libgcc-ng                 14.1.0               h69a702a_1    conda-forge
libgomp                   14.1.0               h77fa898_1    conda-forge
libstdcxx                 14.1.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.1.0               h4852527_1    conda-forge
lima                      2.9.0                h9ee0642_1    bioconda
pbmm2                     1.14.99              h9ee0642_0    bioconda
pbpigeon                  1.2.0                h4ac6f70_0    bioconda
pbskera                   1.2.0                hdfd78af_0    bioconda
pbtk                      3.1.1                h9ee0642_0    bioconda

Describe the bug
I want to assign biosample names to each barcode pair so that they match my experimental sample_ids. When I use --biosample-csv input.csv as documented in the official lima documentation, I get a fatal error. I want to do this at this step because we ran multiple SMRT cells with 24 samples total, so each kinnex/MAS-seq barcode was used twice and if I use the default 'Biosample' labels then more than one sample is given the same id.

Error message
| FATAL | Could not find requested barcode from XML from BioSample 3FGP with name 3FGP_5p in barcode file!

To Reproduce
I am using the following command, with input_file (segemented_file) in BAM format, barcode_file in fasta format and output_file in BAM format.

 lima "$segmented_file" \
         "$barcode_file" \
         "$output_file" \
         --isoseq \
         --peek-guess \
         --biosample-csv biosample_ids.csv    

The biosample_ids.csv looks like this:

Barcodes,Bio Sample
3FGP_5p--IsoSeqX_3p,3FGP
4MTP_5p--IsoSeqX_3p,4MTP
5MGP_5p--IsoSeqX_3p,5MGP
1FTP_5p--IsoSeqX_3p,1FTP
6FGP_5p--IsoSeqX_3p,6FGP
4MGP_5p--IsoSeqX_3p,4MGP
5FTP_5p--IsoSeqX_3p,5FTP
4FGP_5p--IsoSeqX_3p,4FGP
1MTP_5p--IsoSeqX_3p,1MTP

I am not sure why the ERROR message is referencing an XML file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant