-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Clinker with Pizzly output #6
Comments
Howdy, Thanks for your kind words and for giving Clinker a go! Hopefully, I can help you get up and running. By having the chromosome names, breakpoint coordinate, and the genome assembly (hg19/38 currently) as input, Clinker gets around the various gene naming conventions that exist. So! How do we get the Pizzly output into what we need? Looks like someone has had a similar problem:
If that helps you out, great, if not, let me know and I can whip something up (would be nice to have as an extension to Clinker). Cheers, |
Thanks a lot for the prompt response. We had come across the grolar script earlier and tried this today to get results (and are almost there). Highly appreciate your time to help make this run. |
No worries, happy to help! |
Hi Breon, Almost there with the clinker pipeline. Getting a small issue at Attaching a complete log file for you reference. Thanks for your time and help sort this. |
Hi Breon, Solved this issue. I had to run my job on login node. Sorry for bugging you about this. The pipeline finished successfully for three input fusions and created nice pdfs :) For one of the fusions in my input, I get
Any idea what could be the likely cause? As I have this fusion in the results and it is highly expressed as well? |
Hey! Glad to see you solved the other issue. Sorry, have not been around this lovely long weekend. Could you please post the junctions.txt file that was generated for this fusion pair? |
I hope you had a nice weekend :) Looking in the |
Hmmm, OK. How many reads are in splice_junction_reads.bam and fusion_breakpoint_reads.bam? A quick way to check this would be to load it into IGV and enable the sashimi plot. If that shows reads spanning/splitting across the gene boundaries, then we know that something went wrong with the pipeline. If nothing shows, then there might be a problem with the alignment. Instructions here: |
Weekend was great! Hope yours was also :). |
Hi Breon, So, I updated I wanted to ask does clinker performs any filtering on the transcripts used for fusion genes or uses all ensemble transcripts? Thanks a lot for your time. Regards, |
Hi Sehrish, Good to hear you got a suitable input with Grolar. No filtering at present, you're getting everything! I'm sure I can write in a parameter for that, let me have a play over the weekend. Cheers, |
Thanks, Breon. Let me know if you need any input from my side. Regards, |
Hi Sehrish, I haven't forgotten you, just having one of those weeks... Thanks! |
Hi Sehrish, I've created a new parameter for filtering by TSL, but I want to do some more testing before I push it up. It shouldn't take much longer. I'm conscious that you have have been waiting for a bit, so to get you on your way if you're pressed for time, I've added a new file in the resources folder called hg38_genCode24_st-sorted-exons_tsl.gtf. This is essentially the same file as hg38_genCode24_st-sorted-exons.gtf but with the added TSL field married up from GencodeV24. If you wanted to do some manual filtering, all you would need to do is to run an AWK/Python script to remove the lines with levels you don't want, rename the file to hg38_genCode24_st-sorted-exons.gtf and run Clinker as per usual. It will give you the desired effect. If you need a script to do that, I have something available, but up to you! Alternatively if you're not pressed for time, I will let you know when I've uploaded the changes :). Cheers, |
Hi Breon, I highly appreciate your efforts to get this working and thanks for keeping me in the loop as well. I can definitely wait for the feature update from you. I am happy to give it a test drive then and update you on the results :) Regards, |
Hi @skanwal, I am just researching this exact question, would you be able to make the code you use for this available somewhere? I would really appreciate it! Thanks, |
@skanwal thanks for being so patient! https://github.com/Oshlack/Clinker/tree/clinker-1.4 This is an update to Clinker that I've added a tsl parameter to AND have removed the Biomart dependency for. I've done some testing and so far so good... but I thought I would get some real user testing as you suggested. Take it for a drive and let me know how it handles :). All you need to do is clone that and run it with a parameter called tsl where you can specify the levels and above to filter for (like the below). For example, if you want transcripts with a tsl of only 1, just set tsl=1. If you would like all TSL under 5, set tsl=4. bpipe -p out=/path/to/your/results/folder -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=true -p header=true -p align_mem=4000000000 -p genome_mem=4000000000 -p fusions=BCR:ABL1 -p tsl=2 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz |
@messersc, I think @skanwal posted this earlier: Thanks for trying Clinker! Let me know if I can help in anyway. |
Hi Breon, Thanks a lot. I really appreciate the work you have done. I wanted to suggest if you could remove biomart dependency because this was stopping us from running it completely on the cluster and I had to run the I have updated the code and started the analysis using one of our samples. Regards, |
Hi Sehrish and Breon, I have a question regarding the grolar output and how to feed it into Clinker - in the examples, if one feeds fusion calls into Clinker, it's with coordinates of the breakpoints. Grolar on the other hand only gives the coordinates of the genes involved, but not the breakpoints. [It's possible to get the breakpoints from the json, but not easily as it seems it only gives coordinates on the transcript, not the genome.] Now, it looks like the breakpoints are not strictly needed as it's possible to run Clinker without, but I was wondering about this. Does the breakpoint information have any influence on how the superTranscriptome is generated? Hope I am making sense, |
Hi Clemens, you are making sense! Clinker only needs to know which two genes are involved in the fusion so it can then join the two relevant superTranscripts together. To do this some coordinate is required that can be mapped to genes within either hg38_genCode24.txt or hg19_ucscGenes.txt. Typically breakpoints have been available in fusion caller output, so I've defaulted to using those, but I don't see any reason why it couldn't be the genes coordinates! So... hopefully I am making sense! In short, nope, the breakpoints specifically don't have any influence. |
Also, just noticed that both hg38_genCode24.txt and hg19_ucscGenes.txt genes were 1bp off on the starting coordinate. Have fixed and uploaded. You might want to pull these down if you're going to be using the gene coordinates, otherwise you might get unexpected superTranscripts fusing together :). |
@skanwal you shouldn't run into any problems. You would likely run into an error during the plotting stage that would say something like "no split reads found spanning the fusion pair" or the like. Did you get something like this when you ran it until completion? Just delete the output of Clinker and start again with that annotation. Sorry about this... I didn't catch it given I was always using breakpoints (where a starting coordinate of +1 would have not been picked up). Let me know if you have any issues :). |
I re-ran the pipeline with updated files and received the following error:
The script I am using to run the pipeline is:
|
Hmmm, ok, I'll have a look tonight. Any chance I can get the first few lines of the Pizzly.csv. You can email it to me if you like? |
Sure.
Let me know if you need more information. |
Thanks! I'll get to the bottom of it, thanks for your patience. |
Thanks Breon. Previously I ran |
It certainly should have! But I noticed that main.py in the current clinker-v1.4 branch has the following line:
Which would solve your error above. I imagine that for some reason or another, there's some wiggyness with that clone. Give it another go in a new folder and see if you come up against anything. |
Seems to have done the trick. It's running the alignment currently - implying we have passed the problematic stage. I'll keep you updated of the run status. Thanks. |
Huzzah! No idea what happened, but glad you're on track. Keen to see the output :). |
Hi Breon, The pipeline is chocking on the biomart dependency as I am running on the cluster. I am getting
|
So close! Ok, so this file: https://github.com/Oshlack/Clinker/blob/clinker-1.4/plotit/fst_plot.r Could you compare your fst_plot.r to this one? I.e. if your copy has library(biomart), we've run into the wrong branch thing again. Another thing it might be, have you been updating the CLINKERDIR variable with the new branch location? |
Yep, the How are we still on the wrong branch though.. That's confusing.
to make sure I am pointing to the new directory. Wondering if something is off in these steps? |
I know right, my head is spinning too! We will get there, I'm sure of it. So this step:
Maybe change to this to remove all doubt
Other than that, the other steps are fine! Otherwise, if you want to just figure out whether it's this branches code or the cloning process, you can always just wget the below, unzip it and then run it as above. Then we can determine whether its the software or something wacky with the cloning process :). |
Thanks for your time, Breon. Strangely, if I specify the branch name, I get
I checked the which branch I am on (in the previous
I have tried checking out to the updated branch and had some progress - a new error :)
|
I reproduced your error while trying to clone the branch as well! I ended up manually typing it in and the error went away! Either way, this worked:
I would delete your current output (it sucks to lose the alignment, but I think that this will be the most simplest way forward - the first few stages had some changes within this branch to update the annotations), clone the branch using the below statement, check the branch is active, change your CLINKERDIR to point this new location, and run it from the start. Hopefully that resolves it all :). |
Thanks Breon. I have submitted a new job - following the above steps (fingers crossed). I'll keep you updated on the run progress. |
The only thing I can think of is a foreign character? You too! Fingers crossed. Breon. |
Hi Breon, The pipeline has choked on another issue again at the plotting stage.
Looking into
Checking Any idea why this file is empty? |
Well, at least we have moved on! I imagine this has got to do with the TSL parameter (i.e. too much has filtered!). Could you please delete references/fst*.fasta file and rerun the pipeline with the TSL parameter omitted? You should get the first python stage again as well as the plotting stage. Once done, what is the contents (particularly the TSL numbers) of the transcripts.gtf file? |
Now, it's back to the same old syntax error:
That's the one that you reproduced as well? |
Does the transcripts.gtf have contents now? I've only been able to reproduce the cloning error thus far, but we will get to the bottom of this :). Here's a question, what happens when you run the test using the new branch?
|
Also try it with a high tsl filter value and see if there's a difference:
EDIT: Remember to change the -p out="" to different folders :). |
Yes, Re- your first suggestion, I get the same error again
Should I just re-run with a high tsl filter value i.e. without deleting any intermediate files? |
No need, let me see if I can replicate that now. I could have sworn I tested that empty case. Most likely something I've done given we've resolved the branch issue. Thanks for your help :). |
Hi @skanwal I found the issue. For some reason the annotation update (resources/hg38_genCode24_st-sorted-exons.gtf) wasn't in the testing branch anymore! I must have overwritten it at some point, sorry! It should be good to go now. All you need to do pull the change from the clinker-1.4 branch and try the test again. I've just cloned a fresh copy and ran through the test with and without the new parameter and got through the pipeline. IF that all works well for you too, you're good to go with your samples. Just delete the reference/fst_reference.fasta file from your results location, rerun the pipeline, and you should get your results (finally!). Let me know if you run into any problems :). |
Hi Breon, Thanks for your help and time. Almost there:
Seems like a memory issue at the plotting stage. Does the pipeline (or I need to set it at any stage)? Testing with the test data, it works fine (except for the syntax error), probably because there is just one fusion to plot? For my data it died on the second one.
|
Hi @skanwal I'll look into the syntax error, but as long as you're getting some result at this stage, we're on the right track! In terms of the memory issue, I have only occasionally run into this and it usually means that the plotting stage is trying to print a whole bunch of stuff. Generally either there are way too many protein domains defined in those regions or there are just a tonne of transcripts. So I suppose we have to courses of action here:
How was the first plot looking? |
Did you mean changing Also try removing one at a time and rerunning |
Sorry for the delay @skanwal, busy day yesterday! That's the line, yes. What I mean is trying all three of these options: If none of them run, it has nothing to do with the printing size. If the program is struggling, we can try increasing the memory available to R or I can try and get to the bottom of why there are so many X for that particular fusion. Even if we can print out the PDF, it might look too busy. You can just rerun the pipeline without deleting anything :). |
Hi Breon, Thanks for getting back to this. The pipeline does not like changing the number of options in the initial list.
|
Ah yes! We will also need to adjust the sizes. Can you please add the ratio parameter to your bpipe snippet? for example:
For each track you remove (such as transcript or domain), just remove one of the values. For example, if you remove the transcript track, set At the end, I'm sure at least one of the options will print all of your fusion pairs successfully. Then we will know where to look :). EDIT: Alternatively... if you're able to supply your Clinker output to me, I can debug it for you :). |
Wondering what is this ratio parameter referring to?
Do I need to delete any intermediate files as well to enable pipeline pick-up this new parameter? Re- supply Clinker output: Thanks. |
So the ratio parameter adjusts the relative size of each track. Basically what's happening is we are specifying more/less tracks than there is. So setting Re sending: I will need the alignments too, so it will be too big to email. Any chance of sending it through another means? I think this will be the quickest form of debugging. Alternatively, email over the zipped contents of the annotation folder :). |
Hi guys, I happen to work on the same problem, trying to get Clinker to work for pizzly output. Any chance you've had this resolved so I can have a working model to start with? Many thanks! -Max |
Hello,
Thanks for the great tool. We are very much interested in using Clinker to visualise fusions in one of the project I am currently working on @umccr.
I have used pizzly as a fusion caller and realized that clinker expects input in a specific format:
chrom1,base1,chrom2,base2
However the header of pizzly output looks like this:
Wondering if you ever tried using such format or how would you suggest converting it to something compatible with Clinker?
Any help appreciated, thanks!
The text was updated successfully, but these errors were encountered: