Spark <> Iceberg bug integration test #482

kevinjqliu · 2024-02-29T02:37:31Z

Apache Iceberg version

None

Please describe the bug 🐞

While working on #444, I ran into a weird bug with Spark integration test.

Particularly here
https://github.com/apache/iceberg-python/compare/main...kevinjqliu:iceberg-python:kevinjqliu/weird-spark-bug?expand=1#diff-ae89704e133e5eb800112d7a84557f2976819b2c5d989a62af97bf922865631bR459

The current snapshot contains 10 data files, as verified in the assert statement just above.

assert tbl.current_snapshot().summary['added-data-files'] == '10'

But the Spark metadata table still returns with 1 file.

The issue goes away completely if I use a new table identifier in L452.

The text was updated successfully, but these errors were encountered:

This was referenced Mar 6, 2024

Disable Spark Catalog caching for integration tests #501

Merged

Bin-pack Writes Operation into multiple parquet files, and parallelize writing WriteTasks #444

Merged

amogh-jahagirdar closed this as completed in #501 Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark <> Iceberg bug integration test #482

Spark <> Iceberg bug integration test #482

kevinjqliu commented Feb 29, 2024

Spark <> Iceberg bug integration test #482

Spark <> Iceberg bug integration test #482

Comments

kevinjqliu commented Feb 29, 2024

Apache Iceberg version

Please describe the bug 🐞