Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest the legacy uuids from the old versionista outputter #61

Open
danielballan opened this issue May 31, 2017 · 8 comments
Open

Ingest the legacy uuids from the old versionista outputter #61

danielballan opened this issue May 31, 2017 · 8 comments

Comments

@danielballan
Copy link
Contributor

It would be nice to have a way to associated legacy Annotations, which I assume will be subjected to a lot of analysis, with Versions in our app. Somehow getting the uuids generated by the old outputter (and now stored only in Google Sheets, I think) sounds slightly painful but possible and useful.

@Mr0grog
Copy link
Member

Mr0grog commented Jun 1, 2017

Some notes for this:

Since rows in those spreadsheets are really versions, this is should probably be matching Versionista version IDs. We can extract them from:

  • In the spreadsheets: the Last Two - Side by Side column is always https://versionista.com/{site}/{page}/{version_id}:0 (version_id is universally unique within Versionista, so all the other fields can be ignored)
  • In the DB: version_record.source_metadata. version_id

The naive way someone could do this now would be to page through all the results of https://web-monitoring-db.herokuapp.com/api/v0/versions?source_type=versionista

If we wanted to better support this, we could:

Alternatively, a different, probably easier approach might be to create an API endpoint for uploading analyst annotation CSVs. It’s kinda messy, but might be the easiest and quickest way to achieve this.

@danielballan
Copy link
Contributor Author

Using the versionista ID is good enough to support the ad hoc analysis I want to do right now. Once we transition away from versionista, perhaps we should do a one-time update to the database to ingest all these legacy UUIDs and associated Annotations.

@Mr0grog
Copy link
Member

Mr0grog commented Jun 1, 2017

👍

@weatherpattern
Copy link
Contributor

As we move forward with having different differs as well, an annotation imported from sheets should have a field indicating that’s where it came from, (Versionista, Scanner, possibly others in the future.)

@danielballan
Copy link
Contributor Author

I don't think we'll need to worry about that when importing sheets of annotations. Each annotation (row in the sheets) already refers to a Version in our database, either by its Versionista ID or by web-monitoring-db UUID or both, and each Version already knows where it came from.

Mr0grog added a commit that referenced this issue Feb 23, 2018
Use the import_annotations_from_sheet to import all the annotations an analyst has created in a given Google Sheet. This can be used to solve #61.

Arguments are:
1. Google sheet ID, e.g. 1-Rq-AclS2GI_yxLmkYVY7FvTfN21KoJtxXtOXXXXXX
2. E-mail of user to attribute the annotation to
3. (optional) Name of spreadsheet tabs to import (comma-separated). If unset, all tabs will be imported.
4. (optional) Row to start at (defaults to 7)
5. (optional) Row to end at. If unset, reads all rows.

When done, it'll output summary information of how many rows were added, skipped, or errored across how many tabs.
@Mr0grog Mr0grog self-assigned this Feb 23, 2018
@Mr0grog
Copy link
Member

Mr0grog commented Mar 2, 2018

Note: the tooling for this was added in #233. Solving this is mainly a matter of executing that rake task regularly (or, more complex: setting up a job that does that work on a schedule).

@Mr0grog Mr0grog added ready and removed in progress labels Mar 19, 2018
@stale
Copy link

stale bot commented Jan 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

@stale stale bot added the stale label Jan 10, 2019
@Mr0grog
Copy link
Member

Mr0grog commented Jan 10, 2019

This will be a requirement for migrating away from Google sheets for important changes.

@stale stale bot removed the stale label Jan 10, 2019
@Mr0grog Mr0grog added this to Ready in Web Monitoring May 23, 2019
@stale stale bot added the stale label Jul 9, 2019
@stale stale bot closed this as completed Jul 16, 2019
Web Monitoring automation moved this from Ready to Done! Jul 16, 2019
@edgi-govdata-archiving edgi-govdata-archiving deleted a comment from stale bot Aug 1, 2019
@Mr0grog Mr0grog reopened this Aug 1, 2019
Web Monitoring automation moved this from Done! to Ready Aug 1, 2019
@stale stale bot removed the stale label Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Web Monitoring
  
Icebox
Development

No branches or pull requests

3 participants