Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Aug 28, 2024

This PR adds a benchmark for appending data. As shown below, Iceberg is currently very slow when an operation contains many new data files. I'll follow up with a fix separately.

Benchmark                    (fast)  (numFiles)  Mode  Cnt   Score   Error  Units
AppendBenchmark.appendFiles    true      500000    ss    5   7.451 ± 0.184   s/op
AppendBenchmark.appendFiles    true     1000000    ss    5  14.646 ± 0.371   s/op
AppendBenchmark.appendFiles    true     2500000    ss    5  36.853 ± 0.798   s/op
AppendBenchmark.appendFiles   false      500000    ss    5   7.556 ± 0.627   s/op
AppendBenchmark.appendFiles   false     1000000    ss    5  14.869 ± 0.286   s/op
AppendBenchmark.appendFiles   false     2500000    ss    5  37.495 ± 1.247   s/op

@github-actions github-actions bot added the core label Aug 28, 2024
@aokolnychyi aokolnychyi force-pushed the fast-append-benchmark branch from 6fae0f9 to fab5980 Compare August 28, 2024 00:47
@aokolnychyi aokolnychyi changed the title Core: Add benchmark for FastAppend Core: Add benchmark for adding files Aug 28, 2024
Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like for unpartitioned table there's not much benchmark difference for fast and merge append. Looking forward to your optimization fix.

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented Aug 28, 2024

Correct, we write new metadata differently in fast and merge APIs but the root cause is the same.

@aokolnychyi aokolnychyi merged commit 6c79640 into apache:main Aug 28, 2024
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @dramaticlly @danielcweeks!

jenbaldwin pushed a commit to Teradata/iceberg that referenced this pull request Sep 17, 2024
* main: (208 commits)
  Docs: Fix Flink 1.20 support versions (apache#11065)
  Flink: Fix compile warning (apache#11072)
  Docs: Initial committer guidelines and requirements for merging (apache#10780)
  Core: Refactor ZOrderByteUtils (apache#10624)
  API: implement types timestamp_ns and timestamptz_ns (apache#9008)
  Build: Bump com.google.errorprone:error_prone_annotations (apache#11055)
  Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062)
  Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018)
  Kafka Connect: Disable publish tasks in runtime project (apache#11032)
  Flink: add unit tests for range distribution on bucket partition column (apache#11033)
  Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027)
  Core: Add benchmark for appending files (apache#11029)
  Build: Ignore benchmark output folders across all modules (apache#11030)
  Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846)
  Docs: bump latest version to 1.6.1 (apache#11036)
  OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024)
  Core: Generate realistic bounds in benchmarks (apache#11022)
  Add REST Compatibility Kit (apache#10908)
  Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009)
  Docs: Add Druid docs url to sidebar (apache#10997)
  ...
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants