Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Clinker with Pizzly output #6

Open
skanwal opened this issue Sep 25, 2018 · 69 comments
Open

Using Clinker with Pizzly output #6

skanwal opened this issue Sep 25, 2018 · 69 comments

Comments

@skanwal
Copy link

skanwal commented Sep 25, 2018

Hello,

Thanks for the great tool. We are very much interested in using Clinker to visualise fusions in one of the project I am currently working on @umccr.

I have used pizzly as a fusion caller and realized that clinker expects input in a specific format:
chrom1,base1,chrom2,base2

However the header of pizzly output looks like this:

geneA.name      geneA.id        geneB.name      geneB.id        paircount       splitcount      transcripts.list
VIM     ENSG00000026025 ALB     ENSG00000163631 4       4       ENST00000544301_0:1642_ENST00000509063_0:1911;ENST00000224237_0:1374_ENST00000509063_0:1911;ENST00000487938_0:1374_ENST00000509063_0:1911

Wondering if you ever tried using such format or how would you suggest converting it to something compatible with Clinker?
Any help appreciated, thanks!

@breons
Copy link
Contributor

breons commented Sep 25, 2018

Howdy,

Thanks for your kind words and for giving Clinker a go! Hopefully, I can help you get up and running.

By having the chromosome names, breakpoint coordinate, and the genome assembly (hg19/38 currently) as input, Clinker gets around the various gene naming conventions that exist.

So! How do we get the Pizzly output into what we need? Looks like someone has had a similar problem:

If that helps you out, great, if not, let me know and I can whip something up (would be nice to have as an extension to Clinker).

Cheers,
Breon.

@skanwal
Copy link
Author

skanwal commented Sep 26, 2018

Thanks a lot for the prompt response.

We had come across the grolar script earlier and tried this today to get results (and are almost there).
Now, testing the clinker setup on our cluster (spartan).
I'll update you if that works or I have any further issues.

Highly appreciate your time to help make this run.

@breons
Copy link
Contributor

breons commented Sep 26, 2018

No worries, happy to help!

@skanwal
Copy link
Author

skanwal commented Sep 29, 2018

Hi Breon,

Almost there with the clinker pipeline. Getting a small issue at Stage prepare_plot. Seems to me that access request to BioMart service is for some reason failing.

Attaching a complete log file for you reference.

Thanks for your time and help sort this.

clinker-error.txt

@skanwal
Copy link
Author

skanwal commented Oct 1, 2018

Hi Breon,

Solved this issue. I had to run my job on login node. Sorry for bugging you about this.

The pipeline finished successfully for three input fusions and created nice pdfs :)

For one of the fusions in my input, I get

C4A:ALB
------------------------------------------
filtering BAM file for fusion of interest
[main_samview] region "C4A:ALB" could not be parsed. Continue anyway.
filtering BAM file for reads with overhangs < 5 (noise reduction)
Creating ancillilary files
Index BAM files

================================== Stage plot_fusion (CCR170012) ===================================
[1] "Plotting: C4A:ALB"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in read.table(locations$junctions) : no lines available in input
Error in read.table(locations$junctions) : no lines available in input
[1] "There are no fusion junctions found in this gene pair."
[1] "Terminating plotting for this fusion."
NULL
ERROR: Expected output file /data/cephfs/punim0010/projects/Kanwal_Clinker/Clinker/MH17T001P013/plots/CCR170012_MH17T001P013_S39/C4A_ALB.pdf could not be found

Any idea what could be the likely cause? As I have this fusion in the results and it is highly expressed as well?

@breons
Copy link
Contributor

breons commented Oct 1, 2018

Hey! Glad to see you solved the other issue. Sorry, have not been around this lovely long weekend.

Could you please post the junctions.txt file that was generated for this fusion pair?

@skanwal
Copy link
Author

skanwal commented Oct 1, 2018

I hope you had a nice weekend :)

Looking in the alignment folder, the junction file for this fusion is empty?

@breons
Copy link
Contributor

breons commented Oct 1, 2018

Hmmm, OK. How many reads are in splice_junction_reads.bam and fusion_breakpoint_reads.bam?

A quick way to check this would be to load it into IGV and enable the sashimi plot. If that shows reads spanning/splitting across the gene boundaries, then we know that something went wrong with the pipeline. If nothing shows, then there might be a problem with the alignment.

Instructions here:
https://github.com/Oshlack/Clinker/wiki/4.-Running-the-pipeline

@breons
Copy link
Contributor

breons commented Oct 1, 2018

Weekend was great! Hope yours was also :).

@skanwal
Copy link
Author

skanwal commented Oct 11, 2018

Hi Breon,

So, I updated grolar.R script to work with fusion calls in tsv format and find genomic coordinates required for visualisation of fusion genes in clinker https://github.com/skanwal/Play/blob/master/fusions/grolar-tsv.R

I wanted to ask does clinker performs any filtering on the transcripts used for fusion genes or uses all ensemble transcripts?
Can we tune it to use only transcripts with supporting evidence such as TSL1 only?

Thanks a lot for your time.

Regards,
Sehrish

@breons
Copy link
Contributor

breons commented Oct 11, 2018

Hi Sehrish,

Good to hear you got a suitable input with Grolar.

No filtering at present, you're getting everything! I'm sure I can write in a parameter for that, let me have a play over the weekend.

Cheers,
Breon.

@skanwal
Copy link
Author

skanwal commented Oct 11, 2018

Thanks, Breon.

Let me know if you need any input from my side.
Happy to help always.

Regards,
Sehrish

@breons
Copy link
Contributor

breons commented Oct 18, 2018

Hi Sehrish,

I haven't forgotten you, just having one of those weeks...

Thanks!
Breon.

@breons
Copy link
Contributor

breons commented Oct 24, 2018

Hi Sehrish,

I've created a new parameter for filtering by TSL, but I want to do some more testing before I push it up. It shouldn't take much longer.

I'm conscious that you have have been waiting for a bit, so to get you on your way if you're pressed for time, I've added a new file in the resources folder called hg38_genCode24_st-sorted-exons_tsl.gtf.

This is essentially the same file as hg38_genCode24_st-sorted-exons.gtf but with the added TSL field married up from GencodeV24. If you wanted to do some manual filtering, all you would need to do is to run an AWK/Python script to remove the lines with levels you don't want, rename the file to hg38_genCode24_st-sorted-exons.gtf and run Clinker as per usual. It will give you the desired effect.

If you need a script to do that, I have something available, but up to you!

Alternatively if you're not pressed for time, I will let you know when I've uploaded the changes :).

Cheers,
Breon.

@skanwal
Copy link
Author

skanwal commented Oct 25, 2018

Hi Breon,

I highly appreciate your efforts to get this working and thanks for keeping me in the loop as well.

I can definitely wait for the feature update from you. I am happy to give it a test drive then and update you on the results :)

Regards,
Sehrish

@messersc
Copy link

messersc commented Nov 8, 2018

Hi @skanwal,

I am just researching this exact question, would you be able to make the code you use for this available somewhere? I would really appreciate it!

Thanks,
Clemens

@breons
Copy link
Contributor

breons commented Nov 8, 2018

@skanwal thanks for being so patient!

https://github.com/Oshlack/Clinker/tree/clinker-1.4

This is an update to Clinker that I've added a tsl parameter to AND have removed the Biomart dependency for. I've done some testing and so far so good... but I thought I would get some real user testing as you suggested. Take it for a drive and let me know how it handles :).

All you need to do is clone that and run it with a parameter called tsl where you can specify the levels and above to filter for (like the below). For example, if you want transcripts with a tsl of only 1, just set tsl=1. If you would like all TSL under 5, set tsl=4.

bpipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=true -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p fusions=BCR:ABL1 -p tsl=2 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

@breons
Copy link
Contributor

breons commented Nov 8, 2018

@messersc, I think @skanwal posted this earlier:
https://github.com/skanwal/Play/blob/master/fusions/grolar-tsv.R

Thanks for trying Clinker! Let me know if I can help in anyway.

@skanwal
Copy link
Author

skanwal commented Nov 9, 2018

Hi Breon,

Thanks a lot. I really appreciate the work you have done. I wanted to suggest if you could remove biomart dependency because this was stopping us from running it completely on the cluster and I had to run the plot stage manually.

I have updated the code and started the analysis using one of our samples.
I'll update you once it finishes/completes till the end.

Regards,
Sehrish

@messersc
Copy link

messersc commented Nov 9, 2018

Hi Sehrish and Breon,

I have a question regarding the grolar output and how to feed it into Clinker - in the examples, if one feeds fusion calls into Clinker, it's with coordinates of the breakpoints. Grolar on the other hand only gives the coordinates of the genes involved, but not the breakpoints. [It's possible to get the breakpoints from the json, but not easily as it seems it only gives coordinates on the transcript, not the genome.]

Now, it looks like the breakpoints are not strictly needed as it's possible to run Clinker without, but I was wondering about this. Does the breakpoint information have any influence on how the superTranscriptome is generated?

Hope I am making sense,
Clemens

@breons
Copy link
Contributor

breons commented Nov 9, 2018

Hi Clemens, you are making sense!

Clinker only needs to know which two genes are involved in the fusion so it can then join the two relevant superTranscripts together.

To do this some coordinate is required that can be mapped to genes within either hg38_genCode24.txt or hg19_ucscGenes.txt. Typically breakpoints have been available in fusion caller output, so I've defaulted to using those, but I don't see any reason why it couldn't be the genes coordinates!

So... hopefully I am making sense! In short, nope, the breakpoints specifically don't have any influence.

@breons
Copy link
Contributor

breons commented Nov 9, 2018

Also, just noticed that both hg38_genCode24.txt and hg19_ucscGenes.txt genes were 1bp off on the starting coordinate. Have fixed and uploaded.

You might want to pull these down if you're going to be using the gene coordinates, otherwise you might get unexpected superTranscripts fusing together :).

@skanwal
Copy link
Author

skanwal commented Nov 12, 2018

Hi Breon,

Thanks for the fix, I'll run again with the update.

@messersc, I use the gene coordinates output from grolar and feed in these values to clinker.
@breons I am hoping using gene coordinates as input instead of breakpoints should not impact the program/plots?

@breons
Copy link
Contributor

breons commented Nov 12, 2018

@skanwal you shouldn't run into any problems. You would likely run into an error during the plotting stage that would say something like "no split reads found spanning the fusion pair" or the like.

Did you get something like this when you ran it until completion? Just delete the output of Clinker and start again with that annotation.

Sorry about this... I didn't catch it given I was always using breakpoints (where a starting coordinate of +1 would have not been picked up).

Let me know if you have any issues :).

@skanwal
Copy link
Author

skanwal commented Nov 12, 2018

I re-ran the pipeline with updated files and received the following error:

====================================================================================================
|                              Starting Pipeline at 2018-11-12 13:44                               |
====================================================================================================

======================================== Stage generate_fst ========================================
Traceback (most recent call last):
Traceback (most recent call last):
==============================================================
    fusions = fusiontools.createFusionList(fusion_results, pos, gene_list_location, st_genes, header, delimiter)
    fusions = fusiontools.createFusionList(fusion_results, pos, gene_list_location, st_genes, header, delimiter)
        Fusion Super Transcript Generator
        A fusion visualiser.
==============================================================
    chromosomes = geneList(gene_locations)
  File "/data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/fusion/fusiontools.py", line 124, in geneList
    gene = Gene(gene_line[1], int(gene_line[2]), int(gene_line[3]))
IndexError: list index out of range
ERROR: Command failed with exit status = 1 :

python /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/fusion/main.py -in /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013/pizzly.csv -out /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013 -pos 1,2,3,4 -del c -st /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/resources/ -genome 38 -competitive true -header true


========================================= Pipeline Failed ==========================================

Command failed with exit status = 1 :

The script I am using to run the pipeline is:

#!/bin/bash
cd /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker
export CLINKERDIR=$PWD
module load Java
bpipe -p out=/data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013 -p caller=$CLINKERDIR/MH17T001P013/pizzly.csv -p col=1,2,3,4 -p genome=38 -p print=true -p competitive=true -p header=true -p align_mem=40000000000 -p genome_mem=40000000000 -p fusions=ALB:APOA1,ALB:FGG,HP:ALB,C4A:ALB -p tsl=1 $CLINKERDIR/workflow/clinker.pipe /data/cephfs/punim0010/projects/Kanwal_Clinker/Clinker/test/fastq/P013/*.fastq.gz

@breons
Copy link
Contributor

breons commented Nov 12, 2018

Hmmm, ok, I'll have a look tonight.

Any chance I can get the first few lines of the Pizzly.csv. You can email it to me if you like?

@skanwal
Copy link
Author

skanwal commented Nov 12, 2018

Sure.

"chrom1","base1","chrom2","base2"
"chr4",73421412,"chr11",116837950
"chr4",73421412,"chr4",154604134
"chr16",72061055,"chr4",73421412
"chr6",32002681,"chr4",73421412
"chr17",50201632,"chr4",73421412
"chr4",71804041,"chr19",6730562

Let me know if you need more information.
Thanks.

@breons
Copy link
Contributor

breons commented Nov 12, 2018

Thanks! I'll get to the bottom of it, thanks for your patience.

@skanwal
Copy link
Author

skanwal commented Nov 15, 2018

Thanks Breon.

Previously I ran git clone https://github.com/Oshlack/Clinker/tree/clinker-1.4, which should theoretically pull exactly the same git branch?

@breons
Copy link
Contributor

breons commented Nov 15, 2018

It certainly should have!

But I noticed that main.py in the current clinker-v1.4 branch has the following line:

fusiontools.createAnnotationFiles(fusions, st_genes, genomic_coordinates, annotation_folder)

Which would solve your error above. I imagine that for some reason or another, there's some wiggyness with that clone. Give it another go in a new folder and see if you come up against anything.

@skanwal
Copy link
Author

skanwal commented Nov 15, 2018

Seems to have done the trick. It's running the alignment currently - implying we have passed the problematic stage.

I'll keep you updated of the run status.

Thanks.

@breons
Copy link
Contributor

breons commented Nov 15, 2018

Huzzah! No idea what happened, but glad you're on track.

Keen to see the output :).

@skanwal
Copy link
Author

skanwal commented Nov 15, 2018

Hi Breon,

The pipeline is chocking on the biomart dependency as I am running on the cluster.
You had mentioned it has been replaced with ensembl? Is my understanding correct?

I am getting

/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Request to BioMart web service failed.
The BioMart web service you're accessing may be down.
Check the following URL and see if this website is available:
http://www.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt
Error in if (!grepl(x = registry, pattern = "^\n*<MartRegistry>")) { :
  argument is of length zero
Calls: prepare -> create -> prepAnnotation -> useMart -> listMarts
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

@breons
Copy link
Contributor

breons commented Nov 16, 2018

So close!

Ok, so this file: https://github.com/Oshlack/Clinker/blob/clinker-1.4/plotit/fst_plot.r
has no requirement for the biomart library and shouldn't have any issues.

Could you compare your fst_plot.r to this one? I.e. if your copy has library(biomart), we've run into the wrong branch thing again.

Another thing it might be, have you been updating the CLINKERDIR variable with the new branch location?

@skanwal
Copy link
Author

skanwal commented Nov 16, 2018

Yep, the biomart library is listed in the fst_plot.r script.

How are we still on the wrong branch though.. That's confusing.
The steps I have taken are:

  • Create a new directory named update2
  • Ran git clone https://github.com/Oshlack/Clinker.git
  • Create a folder in the cloned directory for my input sample.
  • Create a script in the above folder that has
cd /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker
export CLINKERDIR=$PWD

to make sure I am pointing to the new directory.

Wondering if something is off in these steps?

@breons
Copy link
Contributor

breons commented Nov 16, 2018

I know right, my head is spinning too! We will get there, I'm sure of it.

So this step:

git clone https://github.com/Oshlack/Clinker.git

Maybe change to this to remove all doubt

git clone -b clinker-v1.4 https://github.com/Oshlack/Clinker.git

Other than that, the other steps are fine!

Otherwise, if you want to just figure out whether it's this branches code or the cloning process, you can always just wget the below, unzip it and then run it as above. Then we can determine whether its the software or something wacky with the cloning process :).
https://github.com/Oshlack/Clinker/archive/clinker-1.4.zip

@skanwal
Copy link
Author

skanwal commented Nov 16, 2018

Thanks for your time, Breon.

Strangely, if I specify the branch name, I get

$ git clone -b clinker-v1.4 https://github.com/Oshlack/Clinker.git
Cloning into 'Clinker'...
fatal: Remote branch clinker-v1.4 not found in upstream origin
Unexpected end of command stream

I checked the which branch I am on (in the previous update2 folder that threw the biomart error) and no surprises, I was on master

$ git branch -r
  origin/HEAD -> origin/master
  origin/clinker-1.4
  origin/master

I have tried checking out to the updated branch and had some progress - a new error :)

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

@breons
Copy link
Contributor

breons commented Nov 16, 2018

@skanwal

I reproduced your error while trying to clone the branch as well! I ended up manually typing it in and the error went away!

Either way, this worked:

git clone -b clinker-1.4 [email protected]:Oshlack/Clinker.git
git branch
*clinker-1.4

I would delete your current output (it sucks to lose the alignment, but I think that this will be the most simplest way forward - the first few stages had some changes within this branch to update the annotations), clone the branch using the below statement, check the branch is active, change your CLINKERDIR to point this new location, and run it from the start.

Hopefully that resolves it all :).

@skanwal
Copy link
Author

skanwal commented Nov 16, 2018

Thanks Breon.

I have submitted a new job - following the above steps (fingers crossed).
I am still baffled how manually typing the git clone command worked (as it worked for me as well).

I'll keep you updated on the run progress.
And have a nice weekend :)

@breons
Copy link
Contributor

breons commented Nov 16, 2018

The only thing I can think of is a foreign character?

You too! Fingers crossed.

Breon.

@skanwal
Copy link
Author

skanwal commented Nov 18, 2018

Hi Breon,

The pipeline has choked on another issue again at the plotting stage.

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in read.table(locations$transcripts) : no lines available in input
Calls: prepare -> as.data.frame -> read.table
Execution halted
ERROR: Command failed with exit status = 1

Looking into fst_plot.r script, I see that you are reading transcripts info here:

transcripts_location <- paste(results_location, "annotation/transcripts.gtf", sep="/")

Checking annotaion/trasncripts.gtf file, it is empty.
Looking at the pipeline log, all the stages before this one look good.

Any idea why this file is empty?

@breons
Copy link
Contributor

breons commented Nov 18, 2018

Well, at least we have moved on!

I imagine this has got to do with the TSL parameter (i.e. too much has filtered!). Could you please delete references/fst*.fasta file and rerun the pipeline with the TSL parameter omitted?

You should get the first python stage again as well as the plotting stage. Once done, what is the contents (particularly the TSL numbers) of the transcripts.gtf file?

@skanwal
Copy link
Author

skanwal commented Nov 18, 2018

Now, it's back to the same old syntax error:

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

That's the one that you reproduced as well?

@breons
Copy link
Contributor

breons commented Nov 18, 2018

Does the transcripts.gtf have contents now?

I've only been able to reproduce the cloning error thus far, but we will get to the bottom of this :).

Here's a question, what happens when you run the test using the new branch?

pipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

@breons
Copy link
Contributor

breons commented Nov 18, 2018

Also try it with a high tsl filter value and see if there's a difference:

pipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p tsl=5 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

EDIT: Remember to change the -p out="" to different folders :).

@skanwal
Copy link
Author

skanwal commented Nov 19, 2018

Yes, transcripts.gtf has contents.

Re- your first suggestion, I get the same error again

===================================== Stage plot_fusion (test) =====================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: BCR:ABL1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result BCR:ABL1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result/alignment/test/BCR_ABL1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result/alignment/test test 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

Should I just re-run with a high tsl filter value i.e. without deleting any intermediate files?

@breons
Copy link
Contributor

breons commented Nov 19, 2018

No need, let me see if I can replicate that now. I could have sworn I tested that empty case. Most likely something I've done given we've resolved the branch issue. Thanks for your help :).

@breons
Copy link
Contributor

breons commented Nov 19, 2018

Hi @skanwal

I found the issue. For some reason the annotation update (resources/hg38_genCode24_st-sorted-exons.gtf) wasn't in the testing branch anymore! I must have overwritten it at some point, sorry!

It should be good to go now. All you need to do pull the change from the clinker-1.4 branch and try the test again. I've just cloned a fresh copy and ran through the test with and without the new parameter and got through the pipeline.

IF that all works well for you too, you're good to go with your samples. Just delete the reference/fst_reference.fasta file from your results location, rerun the pipeline, and you should get your results (finally!).

Let me know if you run into any problems :).

@skanwal
Copy link
Author

skanwal commented Nov 19, 2018

Hi Breon,

Thanks for your help and time.

Almost there:

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
[1] "PDF created."
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: HP:ALB"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
ERROR: Command failed with exit status = 137 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 HP:ALB /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/HP_ALB 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

Branch CCR170012 in stage Unknown reported message:

Command failed with exit status = 137 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 HP:ALB /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/HP_ALB 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

Use 'bpipe errors' to see output from failed commands.

slurmstepd: error: Detected 1 oom-kill event(s) in step 6936194.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Seems like a memory issue at the plotting stage. Does the pipeline (or I need to set it at any stage)?

Testing with the test data, it works fine (except for the syntax error), probably because there is just one fusion to plot? For my data it died on the second one.

===================================== Stage plot_fusion (test) =====================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: BCR:ABL1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
[1] "PDF created."

======================================== Pipeline Succeeded ========================================

@breons
Copy link
Contributor

breons commented Nov 20, 2018

Hi @skanwal

I'll look into the syntax error, but as long as you're getting some result at this stage, we're on the right track!

In terms of the memory issue, I have only occasionally run into this and it usually means that the plotting stage is trying to print a whole bunch of stuff. Generally either there are way too many protein domains defined in those regions or there are just a tonne of transcripts.

So I suppose we have to courses of action here:

  • At line 636 of fst_plot.r, try changing it to list(coverage, gene_track), and rerunning the pipeline. Naturally this isn't a solution, but this will tell us whether those two tracks are the problem. Also try removing one at a time and rerunning.
  • Implement the memory.size() function in the fst_plot.r script. I can write a parameter in for this if you can verify it works :).

How was the first plot looking?

@skanwal
Copy link
Author

skanwal commented Nov 20, 2018

Did you mean changing line 611 which is. list(coverage, gene_track, domain_track, transcript_track) to list(coverage, gene_track),?

Also try removing one at a time and rerunning
Did you mean removing one fusion at a time from the .csv file?

@breons
Copy link
Contributor

breons commented Nov 21, 2018

Sorry for the delay @skanwal, busy day yesterday!

That's the line, yes. What I mean is trying all three of these options:
list(coverage, gene_track, domain_track,) and
list(coverage, gene_track, transcript_track) and
list(coverage, gene_track)

If none of them run, it has nothing to do with the printing size.
If the first one runs, the program is struggling with the amount of transcripts being printed.
If the second one runs, the program is struggling with the amount of protein domains being printed.

If the program is struggling, we can try increasing the memory available to R or I can try and get to the bottom of why there are so many X for that particular fusion. Even if we can print out the PDF, it might look too busy.

You can just rerun the pipeline without deleting anything :).

@skanwal
Copy link
Author

skanwal commented Nov 22, 2018

Hi Breon,

Thanks for getting back to this.

The pipeline does not like changing the number of options in the initial list.
Probably because these are being used/referred to in other parts of the code such as when creating sizes and trackList vector.

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
Error in .setupTextSize(expandedTrackList, sizes, title.width, spacing = innerMargin) :
  The 'sizes' vector has to match the size of the 'trackList'.
Calls: prepare ... eval -> eval -> eval -> plotTracks -> .setupTextSize
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

@breons
Copy link
Contributor

breons commented Nov 22, 2018

Ah yes! We will also need to adjust the sizes.

Can you please add the ratio parameter to your bpipe snippet? for example:

bpipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p tsl=5 **-p ratio=1,5,1,3,1,3** -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

For each track you remove (such as transcript or domain), just remove one of the values. For example, if you remove the transcript track, set -p ratio=1,5,1,3,3, if you remove both the protein domains and the transcript track, set ratio=1,5,1,3.

At the end, I'm sure at least one of the options will print all of your fusion pairs successfully. Then we will know where to look :).


EDIT: Alternatively... if you're able to supply your Clinker output to me, I can debug it for you :).

@skanwal
Copy link
Author

skanwal commented Nov 22, 2018

Wondering what is this ratio parameter referring to?
None of the above ratio options work and I get the same error:

Error in .setupTextSize(expandedTrackList, sizes, title.width, spacing = innerMargin) :
  The 'sizes' vector has to match the size of the 'trackList'.

Do I need to delete any intermediate files as well to enable pipeline pick-up this new parameter?

Re- supply Clinker output:
Which files/folders will you need to debug? I can email those to you.

Thanks.

@breons
Copy link
Contributor

breons commented Nov 23, 2018

So the ratio parameter adjusts the relative size of each track. Basically what's happening is we are specifying more/less tracks than there is.

So setting list(coverage, gene_track, domain_track) and having three ratio values, i.e. ratio=1,2,3 should resolve that error.

Re sending: I will need the alignments too, so it will be too big to email. Any chance of sending it through another means? I think this will be the quickest form of debugging. Alternatively, email over the zipped contents of the annotation folder :).

@maximillo
Copy link

Hi guys, I happen to work on the same problem, trying to get Clinker to work for pizzly output. Any chance you've had this resolved so I can have a working model to start with? Many thanks!

-Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants