-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug summary
Description
We are experiencing intermittent schedule duplication issues where deployments randomly create duplicate scheduled runs with identical expected start times.
Observed Behavior
- Scheduled runs are duplicated with the same expected_start_time
- Issue occurs sporadically across different deployments
- Multiple flow runs are triggered for the same schedule time
- Problem appears to resolve temporarily when schedules are toggled (deactivated/reactivated)
Version info
Prefect Server: v3.4.15
Prefect Workers: v3.4.15
Kubernetes deployment using Helm charts
Additional context
Suspected Triggers
- System actions or maintenance operations on kube
- Flow redeployment processes
- Potential race conditions during schedule updates
Current Workaround
We've implemented a cleanup script that detects and fixes duplicate schedules by toggling the automation state
deployments = await client.read_deployments(
deployment_filter=DeploymentFilter(tags=DeploymentFilterTags(all_=["prd"]))
)
logger = get_run_logger()
for d in deployments:
logger.info(f"Checking deployment: {d.name} deployment id: {d.id} flow id: {d.flow_id}")
scheduled_runs = await client.get_scheduled_flow_runs_for_deployments([d.id])
scheduled_run_times = []
has_double_schedule = False
for i in scheduled_runs:
if i.expected_start_time in scheduled_run_times:
has_double_schedule = True
logger.info(f"Double schedule detected for {d.name} at {i.expected_start_time}")
break
else:
scheduled_run_times.append(i.expected_start_time)
if has_double_schedule:
schedules = await client.read_deployment_schedules(d.id)
for schedule in schedules:
print(f"Restarting schedule: {schedule.id}")
await client.update_deployment_schedule(d.id, schedule.id, active=False)
await client.update_deployment_schedule(d.id, schedule.id, active=True)
Impact
- Unnecessary resource consumption from duplicate flow runs
- Potential data processing inconsistencies
- Manual intervention required to clean up duplicates
- Operational overhead from monitoring and cleanup processes
Questions
- Root Cause Analysis: What could be causing schedule duplication in Prefect 3.4.15?
- Flow Redeployment: Are there known issues with schedule duplication during deployment updates?
- Database Consistency: Could this be related to PostgreSQL transaction handling or race conditions?
- Prevention: Are there configuration settings or best practices to prevent schedule duplication?
- Detection: Is there a built-in mechanism to detect and prevent duplicate schedules?
jwalton3141 and TheoBabilon
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working