-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predict DOID mappings to UMLS, MeSH, and EFO #68
Conversation
5cbf1fc
to
f2a4386
Compare
f2a4386
to
c550803
Compare
There is an important issue with this that just caused issues in INDRA through which I noticed this. Namely, the script doesn't take into account mappings that are already provided by DOID to MeSH, and adds redundant predictions for these. @cthoyt could you look into this and remove these predictions? |
Another issue is that a lot of one-to-many mappings are added as predicted exact matches. I think it would be better to leave these out or improve the script to take more features into account when deciding what mappings to propose. For instance here (for convenience this is not directly from predictions.tsv but a derived table):
there is an exact match at the name level so that single mapping could be proposed. |
I think this is an issue with the redundant prefixes in the identifiers. I think a potential solution would to start standardizing identifiers in the main files, then provide an export that uses Identifiers.org rules for import in INDRA, since using a non-general solution is requiring writing custom handling for this in many places
That's an excellent idea. It seems totally obvious that one is better than the others. I think the current logic outputs all mappings returned by Gilda, but somewhere inside the scored match object if it has an "exact match" then that's definitely good enough to only keep it. |
These issues haven't yet been resolved, I will try to do something about it now. |
@allenbaron @lschriml there is a tiny issue with the way some of the entries are normalized, but this is the outline of generating the DOID mappings
Issues
pronto