[Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata #9994

Adamyuanyuan · 2025-10-29T02:47:11Z

Purpose of this pull request

Motivation

Problem: In Kafka->Hive streaming, using CURRENT_DATE()/CURRENT_TIMESTAMP() misplaces records when replaying; parsing create_date is brittle due to dirty/mixed formats.
Goal: Reuse SeaTunnel’s metadata mechanism to inject Kafka ConsumerRecord.timestamp as EventTime, then let users materialize it via the Metadata transform for SQL/partitioning.
Design
In KafkaRecordEmitter: capture ConsumerRecord.timestamp per record; in OutputCollector.collect, if record is SeaTunnelRow and timestamp>=0, call MetadataUtil.setEventTime(row, ts).
No schema change, no mandatory new options; injection is on by default. Users opt-in to materialize via the Metadata transform (e.g., mapping EventTime to kafka_ts).

Does this PR introduce any user-facing change?

user can use

transform {
  Metadata {
    source_table_name = "result_table"
    result_table_name = "result_with_meta"
    metadata_fields = { EventTime = "kafka_ts" }
  }
  Sql {
    source_table_name = "result_with_meta"
    result_table_name = "source_table"
    query = "select ..., FROM_UNIXTIME(kafka_ts/1000, 'yyyy-MM-dd', 'Asia/Shanghai') as pt from result_with_meta where kafka_ts >= 0"
  }
}

to partitioning and transforms

How was this patch tested?

yes,UT and E2E

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update plugin-mapping.properties and add new connector information in it
2. Update the pom file of seatunnel-dist
3. Add ci label in label-scope-conf
4. Add e2e testcase in seatunnel-e2e
5. Update connector plugin_config

Adamyuanyuan · 2025-11-03T06:40:47Z

This PR can run successfully in Flink, but the E2E run in Spark fails. The reason is that the Spark engine path loses options during the conversion between SeaTunnelRow and Spark Row, causing the Metadata Transform to fail to retrieve EventTime. As a result, the output event_time_ms is empty, leading to an assertion failure.

I need to confirm whether to fix this issue in this PR or start a separate PR specifically to address the problem of metadata loss in the Spark translation layer.

…w doesn't already carry an EventTime

…ev-kafka-EventTime-1028

wangxiaogang added 2 commits October 28, 2025 17:35

BDPL-25768 KafkaRecordEmitter

22f9504

BDPL-25768 Add UT for kafka event time

96199b8

github-actions bot added connectors-v2 e2e kafka labels Oct 29, 2025

Adamyuanyuan changed the title ~~[Feature][Kafka source connector] Inject Kafka record timestamp as EventTime metadata~~ [Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata Oct 29, 2025

wangxiaogang added 3 commits October 29, 2025 11:25

BDPL-25768 fix KafkaIT spotless

408a033

BDPL-25768 fix KafkaIT e2e, adds that missing piece in KafkaSourceConfig

1e67117

BDPL-25768 fix spotless

ea6b9cb

wangxiaogang and others added 4 commits November 3, 2025 15:17

BDPL-25768 disable SPARK engine e2e.

d28d529

Merge branch 'apache:dev' into dev-kafka-EventTime-1028

ff278d2

BDPL-25768 Attach Kafka record timestamp into metadata only if the ro…

896e8bb

…w doesn't already carry an EventTime

Merge remote-tracking branch 'origin/dev-kafka-EventTime-1028' into d…

2de4141

…ev-kafka-EventTime-1028

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata #9994

[Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata #9994

Uh oh!

Adamyuanyuan commented Oct 29, 2025

Uh oh!

Adamyuanyuan commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata #9994

Are you sure you want to change the base?

[Feature][Kafka source] Inject Kafka record timestamp as EventTime metadata #9994

Uh oh!

Conversation

Adamyuanyuan commented Oct 29, 2025

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Uh oh!

Adamyuanyuan commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Adamyuanyuan commented Nov 3, 2025 •

edited

Loading