Extracting introns and the number of reads associated with them from a SAM file using base Python (no existing libraries used) Grade: A2
Input Required:
- Standard sam file.
- A comma-separated file with the gene_IDs, transcript_ID and genomic location of all the genes of whatever chromosome is aligned in the SAM file.
What the code does:
- Parses the sam file, and identifies whether each line in the file is a deletion, insertion, matches, splits or a JUNCTION(intron)
- For each junction identified, it counts how many reads are associated with it
- Next it checks if the junction is inside a gene from the list of genes from the second file.
- If the junction is inside a gene, it is marked as an intron, and it is outputted to a file that tells us the gene ID, the start and end of the intron, and the number of reads in that.