-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to manually import annotations from analyst CSVs #487
Add script to manually import annotations from analyst CSVs #487
Conversation
I'm putting this up for review so that we can iron out any initial issues and then decide on paths for the merging/deduping and author-setting issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a fantastic start! All the comments I made inline are mostly small technical issues — overall, this seems like a good approach.
Would you move this to the scripts/
folder?
doesn't set the annotation author correctly, instead uses the API call initator
Might be good to include that as just another field in the annotation for now, until we figure out the best solution for this.
doesn't include any metadata about sheet "schema"
Let’s just add a field for this with the name annotation_schema
and the value edgi_analyst_v2
.
doesn't access sheets via the google drive API, just parses CSVs
I think that’s an OK limitation for now. Can address in a separate PR.
as noted, no merging or deduping
I think the plan we landed on for the API side of this is that multiple posts by the same user for the same change would just update a single annotation, so that should mitigate this. Of course, I need to check whether that’s what we’re actually doing in practice. :P
Thanks for all the comments! I'll get around to most of this tomorrow, but:
I think https://github.com/edgi-govdata-archiving/web-monitoring-db/blob/42bb8c7020e4d139fa278a95e0947cf9c8329659/app/models/change.rb#L82 looks like it is indeed finding any extant annotation for that change by that author, but then overwriting its data. |
Ahhhhh, so it is. I guess you’d need to find it and read it here first, though it might be simpler if posting an annotation took a parameter for update vs. replace. Hmmm. Update: see edgi-govdata-archiving/web-monitoring-db#375 (comment) for discussion. |
scripts/annotations_import.py
Outdated
# change to the API | ||
|
||
bool_keys = [ | ||
'Language alteration', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good to me, but the one thing I’m wondering is whether we should attempt to map these names to something a little less prone to change, e.g. “Content change/addition/removal” → “content_changed”.
I’m not sure how often the titles of these columns has been altered without changing their meaning in the past, but I’m fairly sure it’s happened a few times (fixing typos, slight rewordings, etc.). Even though doing a mapping like this means any display needs to also know what text to show, it seems like it would make a more reliable system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please take a look to see if the structure looks alright, and if you'd like to change any of the new json keys
…o correspond to multiple csv column names which might change over time.
…oceeding with a malformed sheet - Minor cleanup of attribute info initializatino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good now. Noted a couple minor nits inline, but neither of them are major problems.
All our scripts don't have extensions and are marked as executable, and this does the same with `annotations_import`. Also makes some minor whitespace and import tweaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍 👍
Is this good to merge now, or was there more you wanted to do? |
This is good to go, thanks! |
THEN IN IT GOES |
This is a precursor to any automated importing, and does not include any de-duping or merging, so don't run it on the same csv multiple times!
Known issues (to either be addressed in this PR or subsequent ones, your call):