Using Clinker with Pizzly output #6

skanwal · 2018-09-25T05:00:26Z

Hello,

Thanks for the great tool. We are very much interested in using Clinker to visualise fusions in one of the project I am currently working on @umccr.

I have used pizzly as a fusion caller and realized that clinker expects input in a specific format:
chrom1,base1,chrom2,base2

However the header of pizzly output looks like this:

geneA.name      geneA.id        geneB.name      geneB.id        paircount       splitcount      transcripts.list
VIM     ENSG00000026025 ALB     ENSG00000163631 4       4       ENST00000544301_0:1642_ENST00000509063_0:1911;ENST00000224237_0:1374_ENST00000509063_0:1911;ENST00000487938_0:1374_ENST00000509063_0:1911

Wondering if you ever tried using such format or how would you suggest converting it to something compatible with Clinker?
Any help appreciated, thanks!

The text was updated successfully, but these errors were encountered:

breons · 2018-09-25T11:39:37Z

Howdy,

Thanks for your kind words and for giving Clinker a go! Hopefully, I can help you get up and running.

By having the chromosome names, breakpoint coordinate, and the genome assembly (hg19/38 currently) as input, Clinker gets around the various gene naming conventions that exist.

So! How do we get the Pizzly output into what we need? Looks like someone has had a similar problem:

If that helps you out, great, if not, let me know and I can whip something up (would be nice to have as an extension to Clinker).

Cheers,
Breon.

skanwal · 2018-09-26T12:44:46Z

Thanks a lot for the prompt response.

We had come across the grolar script earlier and tried this today to get results (and are almost there).
Now, testing the clinker setup on our cluster (spartan).
I'll update you if that works or I have any further issues.

Highly appreciate your time to help make this run.

breons · 2018-09-26T12:50:27Z

No worries, happy to help!

skanwal · 2018-09-29T13:55:00Z

Hi Breon,

Almost there with the clinker pipeline. Getting a small issue at Stage prepare_plot. Seems to me that access request to BioMart service is for some reason failing.

Attaching a complete log file for you reference.

Thanks for your time and help sort this.

clinker-error.txt

skanwal · 2018-10-01T01:32:46Z

Hi Breon,

Solved this issue. I had to run my job on login node. Sorry for bugging you about this.

The pipeline finished successfully for three input fusions and created nice pdfs :)

For one of the fusions in my input, I get

C4A:ALB
------------------------------------------
filtering BAM file for fusion of interest
[main_samview] region "C4A:ALB" could not be parsed. Continue anyway.
filtering BAM file for reads with overhangs < 5 (noise reduction)
Creating ancillilary files
Index BAM files

================================== Stage plot_fusion (CCR170012) ===================================
[1] "Plotting: C4A:ALB"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in read.table(locations$junctions) : no lines available in input
Error in read.table(locations$junctions) : no lines available in input
[1] "There are no fusion junctions found in this gene pair."
[1] "Terminating plotting for this fusion."
NULL
ERROR: Expected output file /data/cephfs/punim0010/projects/Kanwal_Clinker/Clinker/MH17T001P013/plots/CCR170012_MH17T001P013_S39/C4A_ALB.pdf could not be found

Any idea what could be the likely cause? As I have this fusion in the results and it is highly expressed as well?

breons · 2018-10-01T02:18:25Z

Hey! Glad to see you solved the other issue. Sorry, have not been around this lovely long weekend.

Could you please post the junctions.txt file that was generated for this fusion pair?

skanwal · 2018-10-01T02:21:36Z

I hope you had a nice weekend :)

Looking in the alignment folder, the junction file for this fusion is empty?

breons · 2018-10-01T03:30:14Z

Hmmm, OK. How many reads are in splice_junction_reads.bam and fusion_breakpoint_reads.bam?

A quick way to check this would be to load it into IGV and enable the sashimi plot. If that shows reads spanning/splitting across the gene boundaries, then we know that something went wrong with the pipeline. If nothing shows, then there might be a problem with the alignment.

Instructions here:
https://github.com/Oshlack/Clinker/wiki/4.-Running-the-pipeline

breons · 2018-10-01T03:31:00Z

Weekend was great! Hope yours was also :).

skanwal · 2018-10-11T02:38:09Z

Hi Breon,

So, I updated grolar.R script to work with fusion calls in tsv format and find genomic coordinates required for visualisation of fusion genes in clinker https://github.com/skanwal/Play/blob/master/fusions/grolar-tsv.R

I wanted to ask does clinker performs any filtering on the transcripts used for fusion genes or uses all ensemble transcripts?
Can we tune it to use only transcripts with supporting evidence such as TSL1 only?

Thanks a lot for your time.

Regards,
Sehrish

breons · 2018-10-11T11:11:03Z

Hi Sehrish,

Good to hear you got a suitable input with Grolar.

No filtering at present, you're getting everything! I'm sure I can write in a parameter for that, let me have a play over the weekend.

Cheers,
Breon.

skanwal · 2018-10-11T21:17:58Z

Thanks, Breon.

Let me know if you need any input from my side.
Happy to help always.

Regards,
Sehrish

breons · 2018-10-18T20:12:59Z

Hi Sehrish,

I haven't forgotten you, just having one of those weeks...

Thanks!
Breon.

breons · 2018-10-24T01:16:39Z

Hi Sehrish,

I've created a new parameter for filtering by TSL, but I want to do some more testing before I push it up. It shouldn't take much longer.

I'm conscious that you have have been waiting for a bit, so to get you on your way if you're pressed for time, I've added a new file in the resources folder called hg38_genCode24_st-sorted-exons_tsl.gtf.

This is essentially the same file as hg38_genCode24_st-sorted-exons.gtf but with the added TSL field married up from GencodeV24. If you wanted to do some manual filtering, all you would need to do is to run an AWK/Python script to remove the lines with levels you don't want, rename the file to hg38_genCode24_st-sorted-exons.gtf and run Clinker as per usual. It will give you the desired effect.

If you need a script to do that, I have something available, but up to you!

Alternatively if you're not pressed for time, I will let you know when I've uploaded the changes :).

Cheers,
Breon.

skanwal · 2018-10-25T00:18:42Z

Hi Breon,

I highly appreciate your efforts to get this working and thanks for keeping me in the loop as well.

I can definitely wait for the feature update from you. I am happy to give it a test drive then and update you on the results :)

Regards,
Sehrish

messersc · 2018-11-08T09:10:12Z

Hi @skanwal,

I am just researching this exact question, would you be able to make the code you use for this available somewhere? I would really appreciate it!

Thanks,
Clemens

breons · 2018-11-08T12:34:38Z

@skanwal thanks for being so patient!

https://github.com/Oshlack/Clinker/tree/clinker-1.4

This is an update to Clinker that I've added a tsl parameter to AND have removed the Biomart dependency for. I've done some testing and so far so good... but I thought I would get some real user testing as you suggested. Take it for a drive and let me know how it handles :).

All you need to do is clone that and run it with a parameter called tsl where you can specify the levels and above to filter for (like the below). For example, if you want transcripts with a tsl of only 1, just set tsl=1. If you would like all TSL under 5, set tsl=4.

bpipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=true -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p fusions=BCR:ABL1 -p tsl=2 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

breons · 2018-11-08T12:35:31Z

@messersc, I think @skanwal posted this earlier:
https://github.com/skanwal/Play/blob/master/fusions/grolar-tsv.R

Thanks for trying Clinker! Let me know if I can help in anyway.

skanwal · 2018-11-09T03:40:09Z

Hi Breon,

Thanks a lot. I really appreciate the work you have done. I wanted to suggest if you could remove biomart dependency because this was stopping us from running it completely on the cluster and I had to run the plot stage manually.

I have updated the code and started the analysis using one of our samples.
I'll update you once it finishes/completes till the end.

Regards,
Sehrish

messersc · 2018-11-09T12:53:11Z

Hi Sehrish and Breon,

I have a question regarding the grolar output and how to feed it into Clinker - in the examples, if one feeds fusion calls into Clinker, it's with coordinates of the breakpoints. Grolar on the other hand only gives the coordinates of the genes involved, but not the breakpoints. [It's possible to get the breakpoints from the json, but not easily as it seems it only gives coordinates on the transcript, not the genome.]

Now, it looks like the breakpoints are not strictly needed as it's possible to run Clinker without, but I was wondering about this. Does the breakpoint information have any influence on how the superTranscriptome is generated?

Hope I am making sense,
Clemens

breons · 2018-11-09T13:07:37Z

Hi Clemens, you are making sense!

Clinker only needs to know which two genes are involved in the fusion so it can then join the two relevant superTranscripts together.

To do this some coordinate is required that can be mapped to genes within either hg38_genCode24.txt or hg19_ucscGenes.txt. Typically breakpoints have been available in fusion caller output, so I've defaulted to using those, but I don't see any reason why it couldn't be the genes coordinates!

So... hopefully I am making sense! In short, nope, the breakpoints specifically don't have any influence.

breons · 2018-11-09T13:34:22Z

Also, just noticed that both hg38_genCode24.txt and hg19_ucscGenes.txt genes were 1bp off on the starting coordinate. Have fixed and uploaded.

You might want to pull these down if you're going to be using the gene coordinates, otherwise you might get unexpected superTranscripts fusing together :).

skanwal · 2018-11-12T00:32:37Z

Hi Breon,

Thanks for the fix, I'll run again with the update.

@messersc, I use the gene coordinates output from grolar and feed in these values to clinker.
@breons I am hoping using gene coordinates as input instead of breakpoints should not impact the program/plots?

breons · 2018-11-12T01:04:24Z

@skanwal you shouldn't run into any problems. You would likely run into an error during the plotting stage that would say something like "no split reads found spanning the fusion pair" or the like.

Did you get something like this when you ran it until completion? Just delete the output of Clinker and start again with that annotation.

Sorry about this... I didn't catch it given I was always using breakpoints (where a starting coordinate of +1 would have not been picked up).

Let me know if you have any issues :).

skanwal · 2018-11-12T03:04:24Z

I re-ran the pipeline with updated files and received the following error:

====================================================================================================
|                              Starting Pipeline at 2018-11-12 13:44                               |
====================================================================================================

======================================== Stage generate_fst ========================================
Traceback (most recent call last):
Traceback (most recent call last):
==============================================================
    fusions = fusiontools.createFusionList(fusion_results, pos, gene_list_location, st_genes, header, delimiter)
    fusions = fusiontools.createFusionList(fusion_results, pos, gene_list_location, st_genes, header, delimiter)
        Fusion Super Transcript Generator
        A fusion visualiser.
==============================================================
    chromosomes = geneList(gene_locations)
  File "/data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/fusion/fusiontools.py", line 124, in geneList
    gene = Gene(gene_line[1], int(gene_line[2]), int(gene_line[3]))
IndexError: list index out of range
ERROR: Command failed with exit status = 1 :

python /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/fusion/main.py -in /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013/pizzly.csv -out /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013 -pos 1,2,3,4 -del c -st /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/resources/ -genome 38 -competitive true -header true


========================================= Pipeline Failed ==========================================

Command failed with exit status = 1 :

The script I am using to run the pipeline is:

#!/bin/bash
cd /data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker
export CLINKERDIR=$PWD
module load Java
bpipe -p out=/data/cephfs/punim0010/projects/Kanwal_Clinker/update/Clinker/MH17T001P013 -p caller=$CLINKERDIR/MH17T001P013/pizzly.csv -p col=1,2,3,4 -p genome=38 -p print=true -p competitive=true -p header=true -p align_mem=40000000000 -p genome_mem=40000000000 -p fusions=ALB:APOA1,ALB:FGG,HP:ALB,C4A:ALB -p tsl=1 $CLINKERDIR/workflow/clinker.pipe /data/cephfs/punim0010/projects/Kanwal_Clinker/Clinker/test/fastq/P013/*.fastq.gz

breons · 2018-11-12T03:53:37Z

Hmmm, ok, I'll have a look tonight.

Any chance I can get the first few lines of the Pizzly.csv. You can email it to me if you like?

skanwal · 2018-11-12T03:56:53Z

Sure.

"chrom1","base1","chrom2","base2"
"chr4",73421412,"chr11",116837950
"chr4",73421412,"chr4",154604134
"chr16",72061055,"chr4",73421412
"chr6",32002681,"chr4",73421412
"chr17",50201632,"chr4",73421412
"chr4",71804041,"chr19",6730562

Let me know if you need more information.
Thanks.

breons · 2018-11-12T04:02:09Z

Thanks! I'll get to the bottom of it, thanks for your patience.

skanwal · 2018-11-15T02:26:51Z

Thanks Breon.

Previously I ran git clone https://github.com/Oshlack/Clinker/tree/clinker-1.4, which should theoretically pull exactly the same git branch?

breons · 2018-11-15T02:56:01Z

It certainly should have!

But I noticed that main.py in the current clinker-v1.4 branch has the following line:

fusiontools.createAnnotationFiles(fusions, st_genes, genomic_coordinates, annotation_folder)

Which would solve your error above. I imagine that for some reason or another, there's some wiggyness with that clone. Give it another go in a new folder and see if you come up against anything.

skanwal · 2018-11-15T03:11:51Z

Seems to have done the trick. It's running the alignment currently - implying we have passed the problematic stage.

I'll keep you updated of the run status.

Thanks.

breons · 2018-11-15T03:13:56Z

Huzzah! No idea what happened, but glad you're on track.

Keen to see the output :).

skanwal · 2018-11-15T23:18:09Z

Hi Breon,

The pipeline is chocking on the biomart dependency as I am running on the cluster.
You had mentioned it has been replaced with ensembl? Is my understanding correct?

I am getting

/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Request to BioMart web service failed.
The BioMart web service you're accessing may be down.
Check the following URL and see if this website is available:
http://www.ensembl.org:80/biomart/martservice?type=registry&requestid=biomaRt
Error in if (!grepl(x = registry, pattern = "^\n*<MartRegistry>")) { :
  argument is of length zero
Calls: prepare -> create -> prepAnnotation -> useMart -> listMarts
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

breons · 2018-11-16T00:09:43Z

So close!

Ok, so this file: https://github.com/Oshlack/Clinker/blob/clinker-1.4/plotit/fst_plot.r
has no requirement for the biomart library and shouldn't have any issues.

Could you compare your fst_plot.r to this one? I.e. if your copy has library(biomart), we've run into the wrong branch thing again.

Another thing it might be, have you been updating the CLINKERDIR variable with the new branch location?

skanwal · 2018-11-16T00:19:44Z

Yep, the biomart library is listed in the fst_plot.r script.

How are we still on the wrong branch though.. That's confusing.
The steps I have taken are:

Create a new directory named update2
Ran git clone https://github.com/Oshlack/Clinker.git
Create a folder in the cloned directory for my input sample.
Create a script in the above folder that has

cd /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker
export CLINKERDIR=$PWD

to make sure I am pointing to the new directory.

Wondering if something is off in these steps?

breons · 2018-11-16T00:26:33Z

I know right, my head is spinning too! We will get there, I'm sure of it.

So this step:

git clone https://github.com/Oshlack/Clinker.git

Maybe change to this to remove all doubt

git clone -b clinker-v1.4 https://github.com/Oshlack/Clinker.git

Other than that, the other steps are fine!

Otherwise, if you want to just figure out whether it's this branches code or the cloning process, you can always just wget the below, unzip it and then run it as above. Then we can determine whether its the software or something wacky with the cloning process :).
https://github.com/Oshlack/Clinker/archive/clinker-1.4.zip

skanwal · 2018-11-16T00:51:34Z

Thanks for your time, Breon.

Strangely, if I specify the branch name, I get

$ git clone -b clinker-v1.4 https://github.com/Oshlack/Clinker.git
Cloning into 'Clinker'...
fatal: Remote branch clinker-v1.4 not found in upstream origin
Unexpected end of command stream

I checked the which branch I am on (in the previous update2 folder that threw the biomart error) and no surprises, I was on master

$ git branch -r
  origin/HEAD -> origin/master
  origin/clinker-1.4
  origin/master

I have tried checking out to the updated branch and had some progress - a new error :)

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update2/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

breons · 2018-11-16T03:15:37Z

@skanwal

I reproduced your error while trying to clone the branch as well! I ended up manually typing it in and the error went away!

Either way, this worked:

git clone -b clinker-1.4 [email protected]:Oshlack/Clinker.git
git branch
*clinker-1.4

I would delete your current output (it sucks to lose the alignment, but I think that this will be the most simplest way forward - the first few stages had some changes within this branch to update the annotations), clone the branch using the below statement, check the branch is active, change your CLINKERDIR to point this new location, and run it from the start.

Hopefully that resolves it all :).

skanwal · 2018-11-16T03:29:24Z

Thanks Breon.

I have submitted a new job - following the above steps (fingers crossed).
I am still baffled how manually typing the git clone command worked (as it worked for me as well).

I'll keep you updated on the run progress.
And have a nice weekend :)

breons · 2018-11-16T03:42:43Z

The only thing I can think of is a foreign character?

You too! Fingers crossed.

Breon.

skanwal · 2018-11-18T22:26:54Z

Hi Breon,

The pipeline has choked on another issue again at the plotting stage.

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in read.table(locations$transcripts) : no lines available in input
Calls: prepare -> as.data.frame -> read.table
Execution halted
ERROR: Command failed with exit status = 1

Looking into fst_plot.r script, I see that you are reading transcripts info here:

transcripts_location <- paste(results_location, "annotation/transcripts.gtf", sep="/")

Checking annotaion/trasncripts.gtf file, it is empty.
Looking at the pipeline log, all the stages before this one look good.

Any idea why this file is empty?

breons · 2018-11-18T22:46:53Z

Well, at least we have moved on!

I imagine this has got to do with the TSL parameter (i.e. too much has filtered!). Could you please delete references/fst*.fasta file and rerun the pipeline with the TSL parameter omitted?

You should get the first python stage again as well as the plotting stage. Once done, what is the contents (particularly the TSL numbers) of the transcripts.gtf file?

skanwal · 2018-11-18T23:40:55Z

Now, it's back to the same old syntax error:

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

That's the one that you reproduced as well?

breons · 2018-11-18T23:50:44Z

Does the transcripts.gtf have contents now?

I've only been able to reproduce the cloning error thus far, but we will get to the bottom of this :).

Here's a question, what happens when you run the test using the new branch?

pipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

breons · 2018-11-18T23:53:23Z

Also try it with a high tsl filter value and see if there's a difference:

pipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p tsl=5 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

EDIT: Remember to change the -p out="" to different folders :).

skanwal · 2018-11-19T00:06:06Z

Yes, transcripts.gtf has contents.

Re- your first suggestion, I get the same error again

===================================== Stage plot_fusion (test) =====================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: BCR:ABL1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
Error in `[.data.frame`(transcripts_filtered, , c(1, 4, 5, 7, 13, 16,  :
  undefined columns selected
Calls: prepare -> [ -> [.data.frame
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result BCR:ABL1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result/alignment/test/BCR_ABL1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/test/test_result/alignment/test test 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

Should I just re-run with a high tsl filter value i.e. without deleting any intermediate files?

breons · 2018-11-19T00:23:21Z

No need, let me see if I can replicate that now. I could have sworn I tested that empty case. Most likely something I've done given we've resolved the branch issue. Thanks for your help :).

breons · 2018-11-19T11:03:45Z

Hi @skanwal

I found the issue. For some reason the annotation update (resources/hg38_genCode24_st-sorted-exons.gtf) wasn't in the testing branch anymore! I must have overwritten it at some point, sorry!

It should be good to go now. All you need to do pull the change from the clinker-1.4 branch and try the test again. I've just cloned a fresh copy and ran through the test with and without the new parameter and got through the pipeline.

IF that all works well for you too, you're good to go with your samples. Just delete the reference/fst_reference.fasta file from your results location, rerun the pipeline, and you should get your results (finally!).

Let me know if you run into any problems :).

skanwal · 2018-11-19T23:52:44Z

Hi Breon,

Thanks for your help and time.

Almost there:

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
[1] "PDF created."
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: HP:ALB"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
ERROR: Command failed with exit status = 137 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 HP:ALB /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/HP_ALB 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

Branch CCR170012 in stage Unknown reported message:

Command failed with exit status = 137 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 HP:ALB /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/HP_ALB 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2

Use 'bpipe errors' to see output from failed commands.

slurmstepd: error: Detected 1 oom-kill event(s) in step 6936194.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Seems like a memory issue at the plotting stage. Does the pipeline (or I need to set it at any stage)?

Testing with the test data, it works fine (except for the syntax error), probably because there is just one fusion to plot? For my data it died on the second one.

===================================== Stage plot_fusion (test) =====================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: BCR:ABL1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
[1] "PDF created."

======================================== Pipeline Succeeded ========================================

breons · 2018-11-20T00:13:16Z

Hi @skanwal

I'll look into the syntax error, but as long as you're getting some result at this stage, we're on the right track!

In terms of the memory issue, I have only occasionally run into this and it usually means that the plotting stage is trying to print a whole bunch of stuff. Generally either there are way too many protein domains defined in those regions or there are just a tonne of transcripts.

So I suppose we have to courses of action here:

At line 636 of fst_plot.r, try changing it to list(coverage, gene_track), and rerunning the pipeline. Naturally this isn't a solution, but this will tell us whether those two tracks are the problem. Also try removing one at a time and rerunning.
Implement the memory.size() function in the fst_plot.r script. I can write a parameter in for this if you can verify it works :).

How was the first plot looking?

skanwal · 2018-11-20T01:27:07Z

Did you mean changing line 611 which is. list(coverage, gene_track, domain_track, transcript_track) to list(coverage, gene_track),?

Also try removing one at a time and rerunning
Did you mean removing one fusion at a time from the .csv file?

breons · 2018-11-21T19:40:34Z

Sorry for the delay @skanwal, busy day yesterday!

That's the line, yes. What I mean is trying all three of these options:
list(coverage, gene_track, domain_track,) and
list(coverage, gene_track, transcript_track) and
list(coverage, gene_track)

If none of them run, it has nothing to do with the printing size.
If the first one runs, the program is struggling with the amount of transcripts being printed.
If the second one runs, the program is struggling with the amount of protein domains being printed.

If the program is struggling, we can try increasing the memory available to R or I can try and get to the bottom of why there are so many X for that particular fusion. Even if we can print out the PDF, it might look too busy.

You can just rerun the pipeline without deleting anything :).

skanwal · 2018-11-22T00:10:11Z

Hi Breon,

Thanks for getting back to this.

The pipeline does not like changing the number of options in the initial list.
Probably because these are being used/referred to in other parts of the code such as when creating sizes and trackList vector.

================================== Stage plot_fusion (CCR170012) ===================================
/usr/share/lmod/lmod/init/R: line 2: syntax error near unexpected token `('
/usr/share/lmod/lmod/init/R: line 2: `module <- function(...){'
[1] "Plotting: ALB:APOA1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
Error in .setupTextSize(expandedTrackList, sizes, title.width, spacing = innerMargin) :
  The 'sizes' vector has to match the size of the 'trackList'.
Calls: prepare ... eval -> eval -> eval -> plotTracks -> .setupTextSize
Execution halted
ERROR: Command failed with exit status = 1 :

Rscript /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/plotit/fst_plot.r /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013 ALB:APOA1 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39/ALB_APOA1 16 9 1,3,1,2,4,2 /data/cephfs/punim0010/projects/Kanwal_Clinker/update3/Clinker/MH17T001P013/alignment/CCR170012_MH17T001P013_S39 CCR170012_MH17T001P013_S39 6e65ad,3983AA,2b749a,f05f3b,a1d16e,f2ffe4,215A4A,D7D4E4,6e65ad,ff6d6d axis,coverage,gene,domain,transcript,sashimi 2


========================================= Pipeline Failed ==========================================

breons · 2018-11-22T11:27:16Z

Ah yes! We will also need to adjust the sizes.

Can you please add the ratio parameter to your bpipe snippet? for example:

bpipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=false -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p tsl=5 **-p ratio=1,5,1,3,1,3** -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

For each track you remove (such as transcript or domain), just remove one of the values. For example, if you remove the transcript track, set -p ratio=1,5,1,3,3, if you remove both the protein domains and the transcript track, set ratio=1,5,1,3.

At the end, I'm sure at least one of the options will print all of your fusion pairs successfully. Then we will know where to look :).

EDIT: Alternatively... if you're able to supply your Clinker output to me, I can debug it for you :).

skanwal · 2018-11-22T23:12:10Z

Wondering what is this ratio parameter referring to?
None of the above ratio options work and I get the same error:

Error in .setupTextSize(expandedTrackList, sizes, title.width, spacing = innerMargin) :
  The 'sizes' vector has to match the size of the 'trackList'.

Do I need to delete any intermediate files as well to enable pipeline pick-up this new parameter?

Re- supply Clinker output:
Which files/folders will you need to debug? I can email those to you.

Thanks.

breons · 2018-11-23T02:50:30Z

So the ratio parameter adjusts the relative size of each track. Basically what's happening is we are specifying more/less tracks than there is.

So setting list(coverage, gene_track, domain_track) and having three ratio values, i.e. ratio=1,2,3 should resolve that error.

Re sending: I will need the alignments too, so it will be too big to email. Any chance of sending it through another means? I think this will be the quickest form of debugging. Alternatively, email over the zipped contents of the annotation folder :).

maximillo · 2019-02-14T18:14:34Z

Hi guys, I happen to work on the same problem, trying to get Clinker to work for pizzly output. Any chance you've had this resolved so I can have a working model to start with? Many thanks!

-Max

breons added enhancement question labels Nov 9, 2018

Using Clinker with Pizzly output #6

Using Clinker with Pizzly output #6

Comments

skanwal commented Sep 25, 2018

breons commented Sep 25, 2018 • edited Loading

skanwal commented Sep 26, 2018

breons commented Sep 26, 2018

skanwal commented Sep 29, 2018

skanwal commented Oct 1, 2018 • edited Loading

breons commented Oct 1, 2018

skanwal commented Oct 1, 2018

breons commented Oct 1, 2018 • edited Loading

breons commented Oct 1, 2018

skanwal commented Oct 11, 2018

breons commented Oct 11, 2018

skanwal commented Oct 11, 2018

breons commented Oct 18, 2018

breons commented Oct 24, 2018

skanwal commented Oct 25, 2018

messersc commented Nov 8, 2018

breons commented Nov 8, 2018

breons commented Nov 8, 2018

skanwal commented Nov 9, 2018

messersc commented Nov 9, 2018

breons commented Nov 9, 2018

breons commented Nov 9, 2018

skanwal commented Nov 12, 2018

breons commented Nov 12, 2018

skanwal commented Nov 12, 2018

breons commented Nov 12, 2018

skanwal commented Nov 12, 2018

breons commented Nov 12, 2018

skanwal commented Nov 15, 2018

breons commented Nov 15, 2018 • edited Loading

skanwal commented Nov 15, 2018

breons commented Nov 15, 2018

skanwal commented Nov 15, 2018

breons commented Nov 16, 2018

skanwal commented Nov 16, 2018

breons commented Nov 16, 2018 • edited Loading

skanwal commented Nov 16, 2018 • edited Loading

breons commented Nov 16, 2018

skanwal commented Nov 16, 2018

breons commented Nov 16, 2018

skanwal commented Nov 18, 2018 • edited Loading

breons commented Nov 18, 2018

skanwal commented Nov 18, 2018

breons commented Nov 18, 2018

breons commented Nov 18, 2018 • edited Loading

skanwal commented Nov 19, 2018 • edited Loading

breons commented Nov 19, 2018

breons commented Nov 19, 2018

skanwal commented Nov 19, 2018 • edited Loading

breons commented Nov 20, 2018

skanwal commented Nov 20, 2018 • edited Loading

breons commented Nov 21, 2018

skanwal commented Nov 22, 2018

breons commented Nov 22, 2018 • edited Loading

skanwal commented Nov 22, 2018

breons commented Nov 23, 2018

maximillo commented Feb 14, 2019

breons commented Sep 25, 2018 •

edited

Loading

skanwal commented Oct 1, 2018 •

edited

Loading

breons commented Oct 1, 2018 •

edited

Loading

breons commented Nov 15, 2018 •

edited

Loading

breons commented Nov 16, 2018 •

edited

Loading

skanwal commented Nov 16, 2018 •

edited

Loading

skanwal commented Nov 18, 2018 •

edited

Loading

breons commented Nov 18, 2018 •

edited

Loading

skanwal commented Nov 19, 2018 •

edited

Loading

skanwal commented Nov 19, 2018 •

edited

Loading

skanwal commented Nov 20, 2018 •

edited

Loading

breons commented Nov 22, 2018 •

edited

Loading