Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitch Peer Review #4

Open
mrezzoni opened this issue Oct 27, 2018 · 0 comments
Open

Mitch Peer Review #4

mrezzoni opened this issue Oct 27, 2018 · 0 comments

Comments

@mrezzoni
Copy link

Helena "Don't Google Me" Klein,

I like how you check the legitimacy of the UMI, but think about moving this process higher up to conserve resources. Consider using samtools to sort by chromosome rather than position, as there are significantly fewer of the former. Maybe use a set instead of numpy arrays because sets are a lot faster at checking if something is already in it.

If you have moved past all the unique alignments on a particular chromosome and have progressed to the next chromosome, consider re-initializing whatever data type you were using store your unique alignment. This will help save space assuming the unique alignments have already been written out to a file.

Good job taking the size of your sliding window into account. Make sure it is capable of handling chromosome for species from exoplanets.

Nice high-level functions. Based on your use of the term "start position" instead than "left-most position", it seems like you have considered the implications of strandedness. Will soft_clip be capable of handling any other op's like I or D? Maybe create a function to search for N's once your alignments have passed other tests such as UMI legitimacy so that you avoid iterating through the file for alignments that might not end up being viewed.

Please let me know if you'd like any elaboration on my feedback. Good luck!
Mitch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant