Roadmap for the rest of 2020 #158

Mr0grog · 2020-11-11T09:23:19Z

Near the start of this year, I wrote about the issues with maintaining the Web Monitoring project, and have slowly worked to ramp down some of the development. Ultimately, however, EDGI’s Web Monitoring team is continuing to use this software, and that means they need some path forward where this code requires less (ideally none) of my (@Mr0grog) day-to-day involvement, and where there’s some reasonable hope that another person can maintain it. I’m hoping to focus my efforts between now and the end of the year on that, and the road-map here will serve to help keep things as on-target as possible.

Major Goals

Extract/abstract and publish packages for parts of the code that are more narrowly-focused and useful for other people. It’s at least possible for other people to maintain these, and creates some more broadly useful artifacts whether or not Web Monitoring continues meaningfully into the future.
Make possible for someone other than @Mr0grog to maintain. The biggest blocker here is the variety of languages and frameworks used — web-monitoring-db, in particular, is the major odd one out.
Increase stability/decrease necessary technical oversight. The platform currently needs a lot of TLC to keep it humming along, and that’s a real problem. The most obvious issue here is generating weekly task sheets, which are no longer automated
Clean up the data/follow-up on some basic get-our-ducks-in-a-row issues. I know this is a weird and vague goal. It’s partially in service to the above points (if some of the basics are more cleaned up, they’re easier for others to grok and maintain), and partially about ensuring that, in a possible future where we do ramp down this system, the data is still accessible and useful.

It’s worth noting some of these are in tension with each other. Extracting more small packages makes more to manage, but it also makes it more possible for other people to take on that management. Making big changes to frameworks or languages will necessarily introduce bugs and reduce stability, but will also make it more possible for a single human to maintain everything.

Specific Actions

So how do we accomplish the above?

These are not the only important things between now and the end of 2020. In particular, needed fixes and enhancements in the packages we’ve extracted (wayback and web-monitoring-diff) must continue to be worked on. All else being equal, though, stuff here will take priority.

Update 2020-11-13: Changed effort estimates, removed edgi-govdata-archiving/web-monitoring-processing#661 from roadmap.

Update 2020-12-10: Canceled “Rewrite web-monitoring-db in Python (sort of).” See comments below for more.

The text was updated successfully, but these errors were encountered:

Mr0grog · 2020-11-11T22:23:25Z

Had a good discussion with @danielballan on this. The list mostly feels good, but we will not do the automated export stuff.

We both think the -db rewrite is dicey, but is probably important if the project is to have a maintainable and more stable future. Narrowing its focus to serving the -ui project (instead of having a generic API) is critical to making it feasible.

Mr0grog · 2020-11-13T07:39:05Z

Updated to add two items about optimizing the import script that I discussed with Dan yesterday:

Import script should upload bodies directly to S3 web-monitoring-processing#663
Optionally pre-check known versions in wm import ia-known-pages web-monitoring-processing#664

gretchengehrke · 2020-11-22T19:39:37Z

Thanks for laying this out so clearly @Mr0grog! This looks like a ton of work.

I'm way behind on working on getting volunteer capacity to help you. I have on my to-do list this week to create some tech-specific volunteer recruitment flyers and other postings (for listservs, or maybe Idealist? Or advertisement on Code 4 America?). I figure it'll take time to find people, but I'm guessing you wouldn't want any volunteers coming on board until most of the work you've listed out here is completed. Is that true?

One comment on what you've listed here: For generating task sheets, I wonder if that could be done monthly instead of weekly. I understand that making it automated would be much, much more sustainable, but in the meantime, I think it would be okay to be less frequent, since it takes us so long to actually write reports etc, it wouldn't be the end of the world to see a change a few weeks later than we otherwise would. Would that help things/?

Mr0grog · 2020-11-23T19:48:18Z

I'm guessing you wouldn't want any volunteers coming on board until most of the work you've listed out here is completed. Is that true?

Yeah, that's accurate. It would be extremely hard for someone totally new to the project to contribute productively to these kinds of tasks.

could [generating task sheets] be done monthly instead of weekly… would that help things?

I’m happy to change the schedule in whatever way is useful, but I don’t think this makes an especially huge difference on the technical end:
+ It’ll free up a little bit of my time. (The critical trade-off here is that it takes longer to analyze a larger timeframe, but I’d be doing it less often—overall a net savings).
- It’s still a fundamental problem that this requires me to be around and do it manually, that this is another disjoint, poorly integrated part of the system that needs remediation.

Mr0grog · 2020-11-25T21:33:46Z

Test Klaxon instance is up and running at https://edgi-klaxon-test-v2.herokuapp.com/

I’ve put a few pages in, and should do some more plus invite other users — at least @gretchengehrke and possibly other analysts.

Mr0grog · 2020-12-10T21:36:29Z

Update: at the current pace, and after talking with @danielballan, I don’t think the “Rewrite web-monitoring-db in Python (sort of)” item is feasible for this roadmap anymore. I’m going to focus more on trying to improve documentation and organization around it instead.

Ultimately, that one was always a high-effort, risky task, and while I think it would have gone a long way towards making it more feasible for other people to maintain things, it’s just not going to happen on a short time frame, and doesn’t make sense to sprint on right now. Longer term, it’s also not the only viable direction — Dan is interesting in focusing on improving the tools for people who want to do more general (and technical) data analysis (i.e. -diff and wayback) that don’t depend on a live, running service like -db; seeing if a tool like Klaxon would like to adopt some of these pieces might be a good path, too.

Mr0grog self-assigned this Nov 11, 2020

Mr0grog added [priority-★★★] coordination documentation labels Nov 11, 2020

Mr0grog mentioned this issue Nov 16, 2020

Import script should follow more of a pipeline style edgi-govdata-archiving/web-monitoring-processing#669

Open

Mr0grog pinned this issue Dec 10, 2020

Mr0grog closed this as completed Jan 17, 2023

Mr0grog unpinned this issue Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for the rest of 2020 #158

Roadmap for the rest of 2020 #158

Mr0grog commented Nov 11, 2020 •

edited

Loading

Mr0grog commented Nov 11, 2020

Mr0grog commented Nov 13, 2020

gretchengehrke commented Nov 22, 2020

Mr0grog commented Nov 23, 2020

Mr0grog commented Nov 25, 2020 •

edited

Loading

Mr0grog commented Dec 10, 2020

Roadmap for the rest of 2020 #158

Roadmap for the rest of 2020 #158

Comments

Mr0grog commented Nov 11, 2020 • edited Loading

Major Goals

Specific Actions

Mr0grog commented Nov 11, 2020

Mr0grog commented Nov 13, 2020

gretchengehrke commented Nov 22, 2020

Mr0grog commented Nov 23, 2020

Mr0grog commented Nov 25, 2020 • edited Loading

Mr0grog commented Dec 10, 2020

Mr0grog commented Nov 11, 2020 •

edited

Loading

Mr0grog commented Nov 25, 2020 •

edited

Loading