Skip to content

Intermittent schedule duplication causing double flow runs #18894

@Spichon

Description

@Spichon

Bug summary

Description

We are experiencing intermittent schedule duplication issues where deployments randomly create duplicate scheduled runs with identical expected start times.

Observed Behavior

  • Scheduled runs are duplicated with the same expected_start_time
  • Issue occurs sporadically across different deployments
  • Multiple flow runs are triggered for the same schedule time
  • Problem appears to resolve temporarily when schedules are toggled (deactivated/reactivated)

Version info

Prefect Server: v3.4.15
Prefect Workers: v3.4.15
Kubernetes deployment using Helm charts

Additional context

Suspected Triggers

  • System actions or maintenance operations on kube
  • Flow redeployment processes
  • Potential race conditions during schedule updates

Current Workaround

We've implemented a cleanup script that detects and fixes duplicate schedules by toggling the automation state

deployments = await client.read_deployments(
    deployment_filter=DeploymentFilter(tags=DeploymentFilterTags(all_=["prd"]))
)
logger = get_run_logger()

for d in deployments:
    logger.info(f"Checking deployment: {d.name} deployment id: {d.id} flow id: {d.flow_id}")
    scheduled_runs = await client.get_scheduled_flow_runs_for_deployments([d.id])
    
    scheduled_run_times = []
    has_double_schedule = False
    
    for i in scheduled_runs:
        if i.expected_start_time in scheduled_run_times:
            has_double_schedule = True
            logger.info(f"Double schedule detected for {d.name} at {i.expected_start_time}")
            break
        else:
            scheduled_run_times.append(i.expected_start_time)
    
    if has_double_schedule:
        schedules = await client.read_deployment_schedules(d.id)
        for schedule in schedules:
            print(f"Restarting schedule: {schedule.id}")
            await client.update_deployment_schedule(d.id, schedule.id, active=False)
            await client.update_deployment_schedule(d.id, schedule.id, active=True)

Impact

  • Unnecessary resource consumption from duplicate flow runs
  • Potential data processing inconsistencies
  • Manual intervention required to clean up duplicates
  • Operational overhead from monitoring and cleanup processes

Questions

  • Root Cause Analysis: What could be causing schedule duplication in Prefect 3.4.15?
  • Flow Redeployment: Are there known issues with schedule duplication during deployment updates?
  • Database Consistency: Could this be related to PostgreSQL transaction handling or race conditions?
  • Prevention: Are there configuration settings or best practices to prevent schedule duplication?
  • Detection: Is there a built-in mechanism to detect and prevent duplicate schedules?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions