- 
                Notifications
    
You must be signed in to change notification settings  - Fork 0
 
Ideas
        Serghei Mangul edited this page Aug 5, 2017 
        ·
        4 revisions
      
    - We decided not to merge contigs
 - Top priority for Mohamed : to finish plots. (1) Sort genomes by the cov and report total number of reads (2) color reads according to fidelity
 - Use moch datasets and subsample to obtain genomes covered by only a few reads
 
- Prepare the database by mapping bacteria substring (sliding window) on fungi. And also taking the entire refref besides fungi and map onto the fungi to mask fungi genomes.
 - If the read is mapped entirely to the masked region then ignore it, if it spans the non-masked and masked then keep it if at least 30bp(?) overlap with non-masked
 - Does it make sense to do this masking inside the database? In between virus, fungi, and plasmids?
 - Maybe consider LCA instead of just assigning multi-mapped reads (maybe for future release when we do bacteria)
 - If we do stringent masking we can trust several reads and detect rare organism. This is not available now?
 - Make interactive graph
 - Properties of the graph: take only reads which are UNIQ, certain fidelity, etc
 - Explore all technical parameters
 - Report separately % genome coverage for UNIW, multi-mapped within, and muti-mapped across
 
- Formulate Uniformity of coverage
 - Fidelity of reads
 - UNIX, Multi mapper within
 - PE information
 - Anything else?