Skip to content

Conversation

@RussellSpitzer
Copy link
Member

Because we use LocalIterator in ExpireSnapshotAction, every partition
runs it's own spark job, almost all of which are completely empty. This
leads to a lot of overhead which we don't need in the Test Suite. Setting
shuffle parallelism to 1 (from 200) greatly reduces the test runtime.

Because we use LocalIterator in ExpireSnapshotAction, every partition
runs it's own spark job, almost all of which are completely empty. This
leads to a lot of overhead which we don't need in the Test Suite. Setting
shuffle parallelism to 1 (from 200) greatly reduces the test runtime.
@rdblue
Copy link
Contributor

rdblue commented Aug 20, 2020

Thanks! That looks much better.

@rdblue rdblue merged commit 1af5a8f into apache:master Aug 20, 2020
@rdblue
Copy link
Contributor

rdblue commented Aug 20, 2020

@RussellSpitzer, @aokolnychyi, if using the local iterator causes a job per task to be submitted to Spark, should we avoid using it?

If every file to delete takes up 500 bytes in memory, then the driver can hold 4 million files in 2GB. That seems reasonable to me, so we may be over-optimizing by using the iterator instead of just collecting the data back.

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Aug 20, 2020 via email

@rdblue
Copy link
Contributor

rdblue commented Aug 20, 2020

That sounds good to me!

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Aug 21, 2020 via email

rdblue pushed a commit to rdblue/iceberg that referenced this pull request Aug 24, 2020
…lelism (apache#1362)

Because we use LocalIterator in ExpireSnapshotAction, every partition
runs it's own spark job, almost all of which are completely empty. This
leads to a lot of overhead which we don't need in the Test Suite. Setting
shuffle parallelism to 1 (from 200) greatly reduces the test runtime.
parthchandra pushed a commit to parthchandra/iceberg that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants