Skip to content

MSSQL Bulk Loader De-duplication Bug: Duplicate Records with Identical _airbyte_raw_id Values #65938

@cad2040

Description

@cad2040

Connector Name

destination-mssql

Connector Version

2.2.12

What step the error happened?

During the sync

Relevant information

The MSSQL bulk loader is creating duplicate records in destination tables even when using "Incremental | Append + Dedup" mode. The issue occurs specifically with bulk insert operations and results in multiple rows with identical _airbyte_raw_id values.

  • Expected Behavior:
    • Records with the same _airbyte_raw_id should be de-duplicated
    • Only one record per _airbyte_raw_id should exist in the destination table
    • The MERGE statement should properly handle de-duplication based on _airbyte_raw_id
      Actual Behavior:
    • Multiple records with identical _airbyte_raw_id values are inserted
    • The logs show "Deduplication completed for temp table" and "MERGE completed successfully" but duplicates still exist
    • This only occurs with bulk insert mode; insert mode works correctly
      Root Cause:
      The issue is in MSSQLBulkLoader.kt in the handleDedup method (lines 48-49). The de-duplication logic uses the business primary key (Id) instead of _airbyte_raw_id as the deduplication key:
      val primaryKey =
      if (importType.primaryKey.isNotEmpty()) {
      importType.primaryKey.flatten() // Uses business PK (e.g., "Id")
      } else {
      importType.cursor
      }
      This means de-duplication happens based on business keys + cursor, not on _airbyte_raw_id, which is Airbyte's internal unique identifier.
Image

Relevant log output

Successful De-duplication Messages (Contradictory):
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoadHandler(bulkLoadAndUpsertForDedup):56 Starting bulk insert into table: Airbyte.Campaign from file: blob/Campaign/2025/09/04/1756657454230/0
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoadHandler(bulkLoadAndUpsertForDedup):67 Deduplication completed for temp table
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoadHandler(bulkLoadAndUpsertForDedup):78 MERGE completed successfully into table: Airbyte.Campaign

Stream Configuration (Showing Dedup Mode):
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoader(handleDedup):48 Deduplicating blob/Campaign/2025/09/04/1756657454230/0 into table for Descriptor(namespace=null, name=Campaign)

Bulk Load Process (Showing Bulk Insert Mode):
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoader(handleAppendOverwrite):83 Loading blob/Campaign/2025/09/04/1756657454230/0 into table for Descriptor(namespace=null, name=Campaign)
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoadHandler(bulkLoadForAppendOverwrite):56 Starting bulk insert into table: Airbyte.Campaign from file: blob/Campaign/2025/09/04/1756657454230/0

Stream Status Updates (Showing Incremental Mode):
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoader(handleDedup):48 Stream status TRACE received of status: COMPLETE for stream edw_crm:Campaign
2025-09-04T06:34:25.4306142Z INFO DefaultDispatcher-worker-2 i.a.i.d.m.v.MSSQLBulkLoader(handleDedup):48 Stream Status Update Received: edw_crm:Campaign - COMPLETE

Contribute

  • Yes, I want to contribute

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions