-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove fake subjects from Works #2107
Comments
@tfmoorris Perhaps these tags should be supplanted by a new entry under The Physical Object / Format prior to deletion, or is there a better way? |
We should also remove this code which looks like it is trying to protect these subjects:
These subjects should no longer be used as there are other ways to get the information they were trying to convey. |
@hornc |
Checking ol_dump_editions_2019-12-31.txt
ol_dump_works_2019-12-31.txt
|
Subjects search for quicker testing of current numbers:
NB the solr indexing of these subjects seems way off -- there are many more items showing up with these subjects at the URLs than are in the data dumps. |
The last |
Description
Three of the top five "subjects" are not subjects at all:
Accessible book
- 2.5 millionProtected DAISY
- 1.2 millionIn library
- 0.5 millionThe are also some lower frequency noise terms like
Lending library
andInternet Archive Wishlist
but the three above represent the bulk of the noise.Expectation
The subject list should contain things which are actually subjects of the work.
Proposal & Constraints
Remove the three subjects above. If they're needed to provide functionality move them to a hidden portion of the Solr index where they don't pollute the UI.
The text was updated successfully, but these errors were encountered: