-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep a record of what pages/versions an import touched and how #191
Comments
How is it unrenderable??? Regardless of whatever craziness is happening in the importer, it shouldn’t be possible for I guess I can see some value in linking a version/page to an import, but it doesn’t seem (to me, at least) like a great solution to the problem you’re describing. |
So -- I definitely agree that we need to understand and fix the UI problem on its own. Nothing we do with imports should ever be able to break the |
See edgi-govdata-archiving/web-monitoring-ui#174 for the UI bug. |
See #192 for the DB bug about the specific issue that precipitated this. |
Re-titled this to get at how @danielballan and I have talked about this issue. It doesn’t make sense to store information about what import created a page/version because multiple imports could modify a page or version (if you use Instead, we probably want to log what each import did. Some possible approaches here:
There are plenty of ways to approach this one; almost any of them would probably be good, though I’d shoot for something more structured here. I think we are also aiming to not create database records for this because of extremely high overhead. |
Prompted to revisit this because of #277, since we’re really going to want this info when a line is skipped because a page doesn’t exist and I’m going to throw my lot in for (2) above with an extra property about which row the log is associated with, e.g: {"object": "page", "operation": "created", "uuid": "<uuid>", "at": "2018-03-01T00:01:52Z", "row": 0}
{"object": "version", "operation": "created", "uuid": "<uuid>", "at": "2018-03-01T00:01:52Z", "row": 0}
{"object": "page", "operation": "updated", "uuid": "<uuid>", "at": "2018-03-01T00:01:52Z", "row": 1}
{"object": "version", "operation": "created", "uuid": "<uuid>", "at": "2018-03-01T00:01:52Z", "row": 1}
{"object": "version", "operation": "updated", "uuid": "<uuid>", "at": "2018-03-01T00:01:53Z", "row": 2}
// etc. I think that’s a little easier to manage than (4), where log lines for a row have to be collected together, but I’m feeling like either one of those is the best of what I’d proposed above. Any thoughts, @danielballan? |
This strikes the correct balance between ease/efficiency of writing and completeness. Since we can re-create other forms from this one, I think we should go with this. |
Note from #278: we should make sure these get gzipped when they are saved. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions. |
Update: this was closed by the work in #551, which does not create a special log file for each import. Instead, we looked back at the problems that precipitated this issue and at our current needs, and decided that just logging to the same place we do all our normal logs (STDOUT/STDERR → AWS CloudWatch in production and If you want to see just these logs, you can use # In development:
> tail -f log/development.log | grep '\[import'
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36] Started Import 36
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=0] Created Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=0] Created Version 21d3494c-e92c-4e4c-accf-9cd9f1dcf1f8
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=1] Found Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=1] Created Version af617815-e120-4d64-9a15-944694a688bd
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=2] Found Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=2] Created Version 987083e2-5ca4-4d07-8828-8d82d2a8189d
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=3] Found Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=3] Created Version 2a9a5a1b-c5c4-43ef-9be3-eb2782eae98d
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=4] Found Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=4] Created Version 333fea36-2549-4718-b12b-fdc1e634bac8
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=5] Found Page 9b302e5a-fe72-466f-800a-ec26f1d09674
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36][row=5] Created Version 6e9ce120-2352-4219-aaef-0b6c0a730dd8
[ActiveJob] [ImportVersionsJob] [04b425c1-e80d-40a7-9bb3-7d82ba92092f] [import=36] Finished Import 36 |
I have seen a (yet unresolved) bug where an errored import job causes the UI
/pages
endpoint to become un-renderable. For issues like this, it would be useful to be able to identify and potentially remove records generated by a specific Import.The text was updated successfully, but these errors were encountered: