Skip to content

[SUPPORT] OOM occurs a few days after the Flink jobmanager restarts #7407

@1032851561

Description

@1032851561

Describe the problem you faced

A flink job writes to hudi table , but the jobmanager always has OOM after a period of time.

Application application_1669378398064_0078 failed 1 times (global limit =2; local limit is =1) due to AM Container for 
appattempt_1669378398064_0078_000001 exited with exitCode: -104

Failing this attempt.Diagnostics: [2022-12-08 13:19:49.156]Container 
[pid=39129,containerID=container_e07_1669378398064_0078_01_000001] is running beyond physical memory limits. Current usage: 
4.0 GB of 4 GB physical memory used; 7.0 GB of 8.4 GB virtual memory used. Killing container.

my table options:

  'read.streaming.enabled' = 'true',
  'path' = 'hdfs://xxx/hudi-warehouse/mytable',
  'hive_sync.enable' = 'true',
  'connector' = 'hudi',
  'read.streaming.check-interval' = '30',
  'hoodie.datasource.write.hive_style_partitioning' = 'true',
  'index.state.ttl' = '3650',
  'hoodie.datasource.write.partitionpath.field' = 'create_day',
  'changelog.enabled' = 'true',
  'clean.retain_commits' = '300',
  'table.type' = 'MERGE_ON_READ',
  'hive_sync.mode' = 'hms',


Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.11.1

  • Flink version : 0.14.5

My Questions

  1. Is this normal?
  2. Why does jobmanager use so much memory?
  3. Should I increase memory or use other tuning methods

Metadata

Metadata

Assignees

No one assigned

    Labels

    engine:flinkFlink integrationpriority:highSignificant impact; potential bugs

    Type

    No type

    Projects

    Status

    👤 User Action

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions