-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in grouped attribute search #49
Comments
Thank you for reporting @anantjiit2026! We apologize for the inconvenience. This is indeed a weird behavior, and it sort of relates to an idiosyncrasy with our API service as well as the Python package. For most of our search attributes, this type of subquery grouping should work as expected. However, something a little different happens for the ...this may sound confusing, so it's probably easier to show you some examples... First of all, here is what's happening vs. what's desired:
You'll notice that the only difference between the two is that in the Python-generated query all the "AND" attributes are flattened to the same group level, whereas in the properly formed query the All that said, you've highlighted a current limitation of our Python package, as we are currently flattening any groups when possible. So, we are currently looking into implementing a fix for this and will update you when it's available (hopefully in the next week). In the meantime, if you have any follow-up questions about this, please let us know. |
Thank you for the detailed explanation. I had a 2 more questions
Also i was working with the attributes from the advanced query builder(https://www.rcsb.org/docs/search-and-browse/advanced-search/attribute-details) and translating to the search api attributes from the "Attribute" entry. With nested attributes this needed a lookup for the value of the nested attribute.
|
The screenshot you shared should ideally have worked, but due to a current behavior in the python package, that query ends up being automatically flattened to the same group, hence why you are still getting 320 results instead of 133. We are currently working on a fix for that. Hopefully we should have a fix for the above within the next week or so. But in the meantime, there is a way to force this nesting, though it is much less intuitive: from rcsbapi.search.search_query import Group
from rcsbapi.search import AttributeQuery
q1 = AttributeQuery("rcsb_binding_affinity.type", "exact_match", "EC50")
q2 = AttributeQuery("rcsb_binding_affinity.value", "equals", 2.0)
q3 = AttributeQuery("rcsb_entry_info.selected_polymer_entity_types", "exists")
q4 = AttributeQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", "exists")
q = Group("and", (q1 & q2, *[q3, q4]))
results = list(q())
Nice job on noticing the difference with how those nested-type attributes are identified! We don't have existing code to do this for you yet, but will plan to try coming up with a way to do so in the future. I am curious on what code you're using to do that, though, if you don't mind sharing (here is perfectly fine as long as you're OK with that). Perhaps related though, we do have code to help search through attributes, e.g.: from rcsbapi.search import search_attributes as attrs
matching_attrs = attrs.search("rcsb_binding_affinity")
# print out all details for each matching attribute
for attr in matching_attrs:
print(attr)
# print out just the attribute names
for attr in matching_attrs:
print(attr.attribute) Also, you can view our schema directly at: https://search.rcsb.org/rcsbsearch/v2/metadata/schema. There, you can identify which attributes will need to be grouped in pairs based on whether But as I mentioned, we are looking into ways to make this easier and more automated for the users (or a minimum, providing a warning message regarding the need to group those attributes together). These improvements will likely take a bit longer than a week or two though, just to gauge expectations. |
Thanks for the clarification and the links. I had been looking for something like the schema file. For the lookup I was doing a search from the advanced query builder and looking at the corresponding search api query Then locally making a lookup table for the value of the nested attributes nested_lookup = {
"Accession Code(s) - UniProt":"UniProt",
"Accession Code(s) - GenBank":"GenBank",
"Accession Code(s) - NORINE":"NORINE",
"Global Quality Score - pLDDT":"pLDDT",
"Identifier - Pfam Protein Family":"Pfam",
"Name - Pfam Protein Family":"Pfam",
### out of order for checking
"Count Per Polymer Entity - Modified chemical component":"modified_monomer",
"Binding Affinity Value - EC50":"EC50",
"Component Identifier - Investigated Molecule":"SUBJECT_OF_INVESTIGATION",
"Component Identifier - Has Covalent Linkage":"Has_Covalent_Linkage",
"Component Identifier - Has No Covalent Linkage":"Has_No_Covalent_Linkage",
"Component Identifier - Has Metal Coordination":""
} and if my query attribute had a nested attribute just adding it with the corresponding value from the table. if 'Nested Attribute' in lookup[attr_op_val.attribute].keys():
val_nested = nested_lookup[attr_op_val.attribute]
if q==None:
q = AttributeQuery(lookup[attr_op_val.attribute]['Nested Attribute'], "exact_match", val_nested)
|
Hi @anantjiit2026, thank you for sharing your approach with us. We're glad that strategy works for you for dealing with these edge cases for now. We do hope to implement a more automated "behind-the-scenes" solution for this in the future, but that is looking like it will be a bit more involved of a task than initially hoped. Nonetheless, we are excited to let you know that we have introduced a much simpler and more intuitive solution for forcing the nested grouping of those special kinds of attributes (addressed in #51, thanks to @ivana-truong!). Now, to handle the particular case you originally mentioned in your issue, you can simply do: from rcsbapi.search import group
q = group(q1 & q2) & q3 & q4 So, in full, your code would be: from rcsbapi.search import group
from rcsbapi.search import AttributeQuery
q1 = AttributeQuery("rcsb_binding_affinity.type", "exact_match", "EC50")
q2 = AttributeQuery("rcsb_binding_affinity.value", "equals", 2.0)
q3 = AttributeQuery("rcsb_entry_info.selected_polymer_entity_types", "exists")
q4 = AttributeQuery("rcsb_nonpolymer_entity_container_identifiers.nonpolymer_comp_id", "exists")
q = group(q1 & q2) & q3 & q4
results = list(q())
print("count", len(results)) Please note that before you can do this, you will of course need to upgrade your version of pip install rcsb-api --upgrade We hope this offers a helpful mechanism to address your issue. Of course, we do hope to work on automating that grouping based on the specific types of attributes requested by the user (so that the user doesn't have to actively think about it and manually group them separately), but that will be a longer standing objective. |
I was trying the search api and i ran into this error.
context on the image:
i made 2 queries, the first one having extra constraints, but the result count of the first one came higher, i then searched using query builder which gave 133 results, the same as the second one.
I am using a conda environment, Python 3.13.2,OS: Ubuntu 24.04.1 LTS x86_64, I just made a new conda environment and installed rcsb-api to get this.
The text was updated successfully, but these errors were encountered: