Skip to content

Conversation

@coolderli
Copy link
Contributor

When expiring snapshots, the deletefiles should be deleted.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose some code will be cleaned out after #2510 is merged, so please mark this as draft first.

}

public ExpireSnapshotsActionResult(Long dataFilesDeleted, Long manifestFilesDeleted, Long manifestListsDeleted) {
public ExpireSnapshotsActionResult(Long dataFilesDeleted, Long deleteFilesDeleted,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for public class, should keep existing constructor and add new ones.

private String path = null;
private Long length = null;
private Integer partitionSpecId = null;
private Integer content = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use ManifestContent instead of Integer?

JavaSparkContext context = new JavaSparkContext(spark.sparkContext());
Broadcast<FileIO> ioBroadcast = context.broadcast(SparkUtil.serializableFileIO(table));
return loadAllManifestFileBean(table).filter((FilterFunction<ManifestFileBean>) manifest ->
manifest.content() == ManifestContent.DELETES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should compare using MnaifestContent.DELETES.equals(...)

JavaSparkContext context = new JavaSparkContext(spark.sparkContext());
Broadcast<FileIO> ioBroadcast = context.broadcast(SparkUtil.serializableFileIO(table));
return loadAllManifestFileBean(table).filter((FilterFunction<ManifestFileBean>) manifest ->
manifest.content() == ManifestContent.DATA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should compare using MnaifestContent.DATA.equals(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess == is also correct for enum

int totalJobsRun = jobsAfter - jobsBefore;

checkExpirationResults(1L, 1L, 2L, results);
checkExpirationResults(1L, 0L, 1L, 2L, results);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test that actually tests delete file count > 0, could you add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will add the unit tests.

@coolderli coolderli marked this pull request as draft April 27, 2021 05:27
protected Dataset<Row> buildValidDeleteFileDF(Table table) {
JavaSparkContext context = new JavaSparkContext(spark.sparkContext());
Broadcast<FileIO> ioBroadcast = context.broadcast(SparkUtil.serializableFileIO(table));
return loadAllManifestFileBean(table).filter((FilterFunction<ManifestFileBean>) manifest ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worthwhile to separate the loadAllManifestFileBean into 2 passes instead of one?

@zinking
Copy link
Contributor

zinking commented Apr 28, 2022

@coolderli are you still on this? pleasure to pick up where you left and push this forward.

@szehon-ho
Copy link
Member

Oh I am sorry, I did not notice this pr (should have checked), I made a similar pr here: #4629, if you want to look at it. Although I do not change the manifest table as is done here, which may still be valuable as a separate pr.

By the way, the delete files are actually deleted, (I added a test in #4141) , just the return counts are wrong.

@zinking
Copy link
Contributor

zinking commented Apr 28, 2022

Oh I am sorry, I did not notice this pr (should have checked), I made a similar pr here: #4629, if you want to look at it. Although I do not change the manifest table as is done here, which may still be valuable as a separate pr.

By the way, the delete files are actually deleted, (I added a test in #4141) , just the return counts are wrong.

thanks for sharing, this one should be closed then.

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 17, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants