Skip to content

Conversation

@tw4l
Copy link
Member

@tw4l tw4l commented Dec 10, 2025

Fixes #3011

  • Fix logic that previously incremented workflow's crawlSuccessfulCount for every crawl, regardless of successful or failed
  • Ensure that crawl files are only added to a workflow's size when crawl successfully completes
  • Fix handling of crawl deletion to account for whether crawl was successful or not
  • Add migration to recalculate stats for all workflows
  • Fix stats_recompute_all so that it actually works as expected - it previously failed because we were attempting to call get_running_crawl on the crawl_configs mongo collection rather than on CrawlConfigOps

Testing

  1. Spin up a local instance of Browsertrix on main
  2. Create a workflow and run several crawls. Have some of these crawls be successful, have others be failed or canceled, including canceled after pausing where WACZ files have already been uploaded
  3. Go to Network tab and verify the statistics for the workflow are incorrect, including that all crawls are counted as successful and the size includes the WACZs from paused crawls that then failed/were canceled
  4. Switch to this branch and run the new migration
  5. Go back to the Network tab and verify the statistics for the workflow now match what we expect

Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Tested locally added additional clearing for when there are no crawls, as found a case (without this branch) where crawlSucessfulCount was -1, but there were no crawls, so the stats were not being cleared, and now are!

@ikreymer ikreymer merged commit e5fc0ec into main Dec 11, 2025
24 checks passed
@ikreymer ikreymer deleted the issue-3011-workflow-stats-fix branch December 11, 2025 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Crawl workflow statistics are a bit off from expected

3 participants