Skip to content

[SUPPORT] Got NoSuchElementException while using hudi 0.10.0 and Flink (COW) #4583

@york-yu-ctw

Description

@york-yu-ctw

Tips before filing an issue

  • Have you gone through our FAQs? Yes

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced
While using hudi 0.10.0 with Flink writing data to S3, sometimes got error like below

Caused by: java.util.NoSuchElementException: No value present in Option
        at org.apache.hudi.common.util.Option.get(Option.java:88)
        at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:115)
        at org.apache.hudi.io.FlinkMergeHandle.<init>(FlinkMergeHandle.java:70)
        at org.apache.hudi.client.HoodieFlinkWriteClient.getOrCreateWriteHandle(HoodieFlinkWriteClient.java:497)
        at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:143)
        at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$1(StreamWriteFunction.java:183)
        at org.apache.hudi.sink.StreamWriteFunction.lambda$flushRemaining$7(StreamWriteFunction.java:460)
        at java.util.LinkedHashMap$LinkedValues.forEach(LinkedHashMap.java:608)
        at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:453)
        at org.apache.hudi.sink.StreamWriteFunction.snapshotState(StreamWriteFunction.java:130)
        at org.apache.hudi.sink.common.AbstractStreamWriteFunction.snapshotState(AbstractStreamWriteFunction.java:150)
        at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
        at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
        at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:89)
        at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:218)

https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java#L115
Seems the fileId was not present in the partition

here is my configs

CREATE TABLE data1 (
  schema STRING,
  type STRING,
  uuid STRING,
  `time` STRING,
  page STRING,
  payload STRING,
  dt STRING
)
PARTITIONED BY (`dt`)
WITH (
  'connector' = 'hudi',
  'path' = 's3a://xxxx/data',
  'hoodie.embed.timeline.server' = 'true',
  'write.precombine.field' = 'time',
  'hoodie.parquet.max.file.size' = '62914560',
  'index.bootstrap.enabled' = 'true',
  'hoodie.parquet.block.size' = '62914560',
  'hoodie.metadata.enable' = 'false',
  'hoodie.datasource.write.recordkey.field' = 'uuid',
  'write.tasks' = '4',
  'hoodie.datasource.write.hive_style_partitioning' = 'true',
  'index.state.ttl' = '1.5D',
  'write.bucket_assign.tasks' = '1',
  'read.streaming.enabled' = 'false',
  'table.type' = 'COPY_ON_WRITE',
  'index.global.enabled' = 'false',
)

To Reproduce

Steps to reproduce the behavior:
It was hard to reproduce since it happens every serval days

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.10.0

  • Spark version :

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : No

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions