-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom postnominals #10
Comments
Hi @JamoCA That's a good idea I think. I will have to take a look at the library code to refresh my memory, but maybe we could have separate sets of suffixes for things like IT suffixes, medical suffixes, engineering, etc. What do you think? Do you have any suggestion on how that should be implemented? Do you know of a list for these suffixes? Here's a website of a local clinic that contains a list of its medical staff: https://www.connectmed.co.nz/practice/pitt-street-medical. Are the suffixes in your data similar to those values like "MB ChB, Dip Obst, DCH, M, F RNZCGP" ? |
I'm not entirely sure how to best implement it, but adding support for some of the more popular ones which can't be confused with last names would be beneficial. I'm not very adept with Java. (My primary development language is ColdFusion which uses Java to perform JIT compilation of CFML to class files.) A lot of the industry-specific data that I was provided with was inconsistent and had different capitalization, spacing & period usage. I was initially just title casing the "full name" value that also contained title & suffixes, but then realized that I needed to parse it and remembered that this library was available. I added HNP to the import process, but it was incorrectly parsing some of the suffixes as last names. I wrote a CFML-based wrapper component to:
I also wanted to ensure standardization among the suffixes... while "MD' was identified as a suffix, if "M.D." is used, it was parsed as the last name. (Standard abbreviations usually have periods. The APA Publication Manual recommends not using periods with degrees while other reference manuals recommend using periods.) For suffix identification via regex, I've been using something like In mixed-case usage, I'm reformatting suffixes like " I found some other rules here which state all sources advise against using titles before and after a name at the same time. An example of this is using " "FAAA" was one of the suffixes that I came across. I didn't know what it meant and went to https://www.acronymfinder.com/ to find out. Apparently it's "Fellow of the American Academy of Audiology". There are many acronyms not related to names. The site doesn't appear to have a category solely dedicated to names. |
I also encountered a nuanced sort order regarding how some postnominals are displayed in the industry "medical; audiology". (ie, Here's a short example of the rules I'm using so far: (NOTE: I had to write a couple different rules for Au.D. so that it didn't match too early and leave an extra period.) rules = [
["Ph.D.", ["(?i)\bPh\.*D\.*\b"]],
["M.A.", ["(?i)\bM\.*A\.*\b"]],
["Au.D.", ["(?i)\bAU\.D\.", "(?i)\bAUD\b", "(?i)\bAU\.D\b"]],
["M.S.", ["(?i)\bM\.*S\.*\b"]],
["M.D.", ["(?i)\bM\.*D\.*\b"]],
["CCCC-A", ["(?i)\bCCC\-A"]],
["MSCCCA", ["(?i)\bMSCCCA\b"]],
["FAAA", ["(?i)\bFAAA\b"]]
]; |
… whilst writing them
I've cut a 0.2 release with the new builder. Feel free to re-open or open a new issue in case it doesn't completely solve your use case @JamoCA . Thanks! |
Could support be added to define custom postnominals?
I'm parsing some real-world data and encountering some medical suffixes that I'm having to write extra rules outside of this library.
Here are samples of what I've identified so far with a small sample of about ~100 records provided by a client.
Thanks.
The text was updated successfully, but these errors were encountered: