Skip to content

Conversation

deanq
Copy link
Member

@deanq deanq commented Jun 15, 2025

Fixes #428 #427

Since moving Heartbeat to a multiprocess execution, persisting JobsProgress had become tricky. Although it fixed the Production issue of blocked pings when a CPU is blocked for a long time, the previous release broke local development. This PR shifts JobsProgress to persist at a file level using a pickled .runpod_jobs.pkl store. These changes are properly tested with unit and integration tests.

An additional simulator is also introduced for debugging. See commit 1e06e5cca31529563e1e4e0dff9182fc0d7aa3bf

@deanq deanq merged commit e58d519 into main Jun 15, 2025
8 checks passed
@deanq deanq deleted the fix-multiprocess-bug branch June 15, 2025 23:59
justinwlin added a commit that referenced this pull request Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to start local API server in version 1.7.11

2 participants