Better description of how to run the tool would be helpful #4

rsharris · 2019-07-12T15:56:18Z

The current readme doesn't clearly describe how the user can use the tool to solve the problem it is intended to solve. If I have an assembly, and I want to identify Y-specific contigs, how do I do that?

My best guess, from trying the run the example in the repo, is that the info about which contigs are Y-specific is encoded in the headers of proportion_annotated_contigs.fastq. But this information in not described in the readme. Nor is any step mentioned that will separate Y contigs from the input contigs.

Note that that conclusion is based on the fact that, for me, the output of discovery.py (proportion_annotated_contigs.fastq) is identical to the input (data/male_contigs.fasta), except that annotation has been added.

The command I ran was the one shown as "a typical run":
python discoverY.py --female_bloom --mode female+male
But it is not clear whether this is the appropriate command to run for the example. Based on the files provided, and after digging through the code to see which options would cause all the provided files to be used, that was the command I can up with. This would be made clearer by having a "tutorial" section in the readme that showed the command to be run.

It would also be helpful to provide, as part of the example, the expected output. As it stands, I don't know whether my run of discovery.py worked. It's possible that it is not working and that this has fed into my misunderstanding of how it is supposed to be used.

It's also possible that I don't understand what the example is intended to demonstrate.

The discussion of 'best mode' and the jupyter notebook stuff should clarify whether this step is intended as part of the tyipcal usage pipeline or not. After having a lot of difficulty with the notebook, and looking at it in more detail, and realizing that it doesn't read the output from discovery.py, my best guess is that this is a pre-computing step, to be run before discovery.py, to guide the choice of threshold. However (assuming that is true), there's nothing that indicates how the resulting threhold would be used.

To recap, as it is currently described quite a bit of insight, digging, and guesswork is required on the part of the user.

rsharris · 2019-07-12T16:06:46Z

I should add that when I run the example, it reports that a proportion of 1.0 for each and every contig. That seems really strange -- it would be strange example. How can I know whether this is expected or if instead it's an indication somethings wrong with my installation of the program?

rsharris · 2019-07-16T16:43:48Z

In the current readme, the threshold output by discoverY.py is described as "proportion_shared_with_female". But I think it is really "proportion_NOT_shared_with_female".

Thus values closer to 1 mean a contig is more likely to be from Y.

deilepaita · 2020-12-17T21:34:38Z

Agree!

Output of DiscoverY in README.md should be corrected to "proportion_NOT_shared_with_female", because after running DiscorerY, contig file has the following header: '>Sc0000000 7492748 0.012910911534003885 102.0'; while the printed results in the terminal are:
'No. of contigs seen so far: 1
Current contig ID is : Sc0000000
Median is: 102.0
Total No. of k-mers from this contig: 7492732
No. of k-mers not shared with female: 96738
Proportion is: 0.012910911534003885'

Another correction that should be made is the description on how to calculate k-mers from male reads, because it indicates:
"cd dependency
ln -s ../data/female.fasta #make sure the correct reads file is provided to DSK
./run_dsk_Linux.sh r1.fastq 25"
which is misleading for new users. Why do the user needs to soft link female.fasta to dependency if it is completely unnecessary for running DSK?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better description of how to run the tool would be helpful #4

Better description of how to run the tool would be helpful #4

rsharris commented Jul 12, 2019

rsharris commented Jul 12, 2019

rsharris commented Jul 16, 2019

deilepaita commented Dec 17, 2020

Better description of how to run the tool would be helpful #4

Better description of how to run the tool would be helpful #4

Comments

rsharris commented Jul 12, 2019

rsharris commented Jul 12, 2019

rsharris commented Jul 16, 2019

deilepaita commented Dec 17, 2020