-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy between metadata search results & piped fetch results #125
Comments
Update: I see that the list of samples found in the metadata search and the list of samples in the downloaded biom table do match, but the biom table seems to have sub-set the samples. For example, "13114.palenik.42.s001" in the sample list corresponds to the sample IDs "13114.palenik.42.s001.134469" and "13114.palenik.42.s001.134523" in the biom table. The sample IDs in the metadata table match the list of sample IDs in the biom table, but all the metadata values are identical within each sample "grouping", e.g. "13114.palenik.42.s001.134469" and "13114.palenik.42.s001.134523" have exactly the same metadata. Is there documentation about how and why that sub-sampling was done? I guess I can combine sample replicates (if that's what they are). |
@nvpatin; thank you for the question and update. I think @justinshaffer might be able to answer your question. |
Hi @nvpatin, sorry for a brief delay, I was OOO the last few days. For (1), that is an excellent idea and is not currently something that is exposed to the user, but would be a great addition. I would be happy to propose a suggestion to do this via bash script or python as a stop gap. For (2), the issue is that the same physical sample has been sequenced multiple times. The command shown is correct, but each individual sequencing run is differentiated. These "ambiguities" are expressed in the resulting ambiguity map. You can get around this by specifying If you haven't seen it, there is a longer tutorial on use on the QIIME 2 forum. |
Thank you @wasade that's very helpful! I will check back for future functionality that provides contexts associated with samples in the metadata search results. |
I am trying to download a set of samples based on metadata information. When I search with my parameters, I find a certain number of samples; but when I pipe those results into 'redbiom fetch' (with a particular context) it downloads a different number of samples. I think there is a similar problem when I pipe the search results into 'redbiom summarize contexts'; it shows a list of contexts, some of which are associated with my samples but some of which are not, and I have to guess which one I have to use for fetching. So I have two questions: 1) How can I see the contexts associated only with my searched samples? and 2) How can I only fetch the samples associated with my metadata search? See below for the problems associated with question 2.
Looking for marine water samples within the EMP
% redbiom search metadata "where qiita_study_id == 13114 and empo_4 == 'Water (saline)'" | wc -l
39
Defining a context based on previous search results (it took several attempts to find one that worked)
% echo $CTX
Deblur_2021.09-Illumina-16S-V4-150nt-ac8c0b
Fetching samples based on metadata and context
% redbiom search metadata "where qiita_study_id == 13114 and empo_4 == 'Water (saline)'" | redbiom fetch samples --context $CTX --output EMP_marine_samples.biom
38 sample ambiguities observed. Writing ambiguity mappings to: EMP_marine_samples.biom.ambiguities
Data summary shows many more samples than metadata search originally found
% biom summarize-table -i EMP_marine_samples.biom | head
Num samples: 97
Num observations: 16,547
Total count: 1,354,853
Table density (fraction of non-zero values): 0.030
Counts/sample summary:
Min: 4,111.000
Max: 38,769.000
Median: 12,268.000
Mean: 13,967.557
The text was updated successfully, but these errors were encountered: