-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to flag similar strings #75
Comments
Quick and dirty example:
Interestingly, one of the more prevalent issues appears to be trailing/leading whitespace, probably from older manual copy-pasting... Anything above 0.2 j-w seems to be truly distinct, whereas <0.2 seems to deserve closer inspection. |
Similar for institutions:
yields: However, this isn't as easy to scan manually because of all of the high-similarity University of ... matches that really hide some of the true matches/values that need correction - can you spot them here? ;) |
PI names in screenshot above have been standardized. I picked whichever one was more recent as the "standard." |
We currently do not standardize PI or institution names. It would be helpful to do this on a semi-regular basis.
It would be great if we could have a function that flags similar strings in the Studies table, and add it as, say, a weekly or quarterly job. It would probably require manual intervention to actually fix the data.
The text was updated successfully, but these errors were encountered: