-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better description of how to run the tool would be helpful #4
Comments
I should add that when I run the example, it reports that a proportion of 1.0 for each and every contig. That seems really strange -- it would be strange example. How can I know whether this is expected or if instead it's an indication somethings wrong with my installation of the program? |
In the current readme, the threshold output by discoverY.py is described as "proportion_shared_with_female". But I think it is really "proportion_NOT_shared_with_female". Thus values closer to 1 mean a contig is more likely to be from Y. |
Agree! Output of DiscoverY in README.md should be corrected to "proportion_NOT_shared_with_female", because after running DiscorerY, contig file has the following header: '>Sc0000000 7492748 0.012910911534003885 102.0'; while the printed results in the terminal are: Another correction that should be made is the description on how to calculate k-mers from male reads, because it indicates: |
The current readme doesn't clearly describe how the user can use the tool to solve the problem it is intended to solve. If I have an assembly, and I want to identify Y-specific contigs, how do I do that?
My best guess, from trying the run the example in the repo, is that the info about which contigs are Y-specific is encoded in the headers of proportion_annotated_contigs.fastq. But this information in not described in the readme. Nor is any step mentioned that will separate Y contigs from the input contigs.
Note that that conclusion is based on the fact that, for me, the output of discovery.py (proportion_annotated_contigs.fastq) is identical to the input (data/male_contigs.fasta), except that annotation has been added.
The command I ran was the one shown as "a typical run":
python discoverY.py --female_bloom --mode female+male
But it is not clear whether this is the appropriate command to run for the example. Based on the files provided, and after digging through the code to see which options would cause all the provided files to be used, that was the command I can up with. This would be made clearer by having a "tutorial" section in the readme that showed the command to be run.
It would also be helpful to provide, as part of the example, the expected output. As it stands, I don't know whether my run of discovery.py worked. It's possible that it is not working and that this has fed into my misunderstanding of how it is supposed to be used.
It's also possible that I don't understand what the example is intended to demonstrate.
The discussion of 'best mode' and the jupyter notebook stuff should clarify whether this step is intended as part of the tyipcal usage pipeline or not. After having a lot of difficulty with the notebook, and looking at it in more detail, and realizing that it doesn't read the output from discovery.py, my best guess is that this is a pre-computing step, to be run before discovery.py, to guide the choice of threshold. However (assuming that is true), there's nothing that indicates how the resulting threhold would be used.
To recap, as it is currently described quite a bit of insight, digging, and guesswork is required on the part of the user.
The text was updated successfully, but these errors were encountered: