-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract transcript sequences from clinker output #15
Comments
Hi Max, Good one! I think this should be fairly straight forward. You will notice that in the The fusion_locations.bed will give you the coordinates of where the two superTranscripts have been concatenated together, the junctions.txt will give you the read count for each junction found. With this information you should be able to get a list of all fusion variants, i.e. entries that span the fusion boundary would indicate the fusion breakpoint. Once you have that list, you should be able to just go to An example. If you have a GENE1:GENE2 fusion, where GENE1 and GENE2 are each 1000 bases long and where junctions.txt indicates that there is a junction between 200 and 1400, you can go to fst_reference.fasta and take the first 200 bases of GENE1 and the last 600 bases of GENE2. Concatenate those together and you should have your result. What you think? |
Hi Breon, Thank you so much for your detailed instruction! This definitely explains a lot. But this way all transcript start from 0 of GENE1, "break" at the junction point of GENE1, then continue at the junction point of GENE2, stop at the end of GENE2 -- meaning all transcripts have unanimous start and end? By looking at the result of one of our samples (attached), it's obvious that some transcripts don't start from the very beginning of GENE1 or end at the very end of GENE2. This got me wonder how the start and end of these transcripts (in Transcripts track) were computed or they were simply extracted from existing reference databases. Can you please comment on this? I probably missed some critical link in here. Sorry if this sounds completely dumb :) On a side note, I tried using pizzly to call the fusions as well. In the results, pizzly gives all possible variants of fusions in the format of, for instance: Max |
Hi Max, Apologies! I misunderstood. Yes certainly the fusion transcripts could start and end at different points as well. Also, the transcript track is a representation of an existing reference database, which has its own visual benefit too. This is a bit out of scope for Clinker, but let me have a chat to one of my colleagues as I think there may be something that can help you do this. Cheers, |
Hi Breon, Sounds great, I really appreciate your effort on helping me out!!! Max |
Hi Breon,
This might not be quite relevant to your development but I'd really appreciate it if I can have some input from you. So, clinker weighs different variants of fusions by associate the number of split reads to a corresponding junction, I wonder is there a way to extract the full transcript sequence of each different "variant" of the fusions from clinker output files. Can you please advise? Thanks a lot!
Max
The text was updated successfully, but these errors were encountered: