Skip to content

Conversation

@miz060
Copy link
Member

@miz060 miz060 commented Jan 14, 2025

Merge Checklist
Summary
Test Methodology

@miz060 miz060 requested review from a team as code owners January 14, 2025 00:26
}

warn!("Retrying layer image download...");
continue; // Retry fetching the layer image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but would it make sense to sleep here for a bit? Presumably we will run against the deadline that containerd has, so cannot sleep for too long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. It would make sense to sleep a bit if it's truely a network issue. I think the specific timeout is managed by the client (k8s) instead of containerd itself. Given that I set sleep time to be 500ms for now.

file.rewind().context("failed to rewind the file handle")?;
tarindex::append_index(&mut file).context("failed to append tar index")?;
// Process the layer
let process_result = tokio::task::spawn_blocking({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the download itself be part of the spawn_blocking block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved download itself to be part of the new function to be run inside the while loop.

if let Err(e) = std::io::copy(&mut gz_decoder, &mut file) {
let copy_error = format!("failed to copy payload from gz decoder {:?}", e);
error!("{}", copy_error);
return Err(anyhow::anyhow!(copy_error));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the error we hit; should we trigger retry with a new download here as well? Or what are we doing to resolve it?

failed to extract image layer: failed to copy payload from gz decoder Error { kind: UnexpectedEof, message: \"failed to fill whole buffer\" }: unknown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, failing here will trigger a new download.

@miz060 miz060 merged commit 66d2248 into jiria/solar Jan 16, 2025
41 of 52 checks passed
@miz060 miz060 deleted the mitchzhu/add_retry branch January 17, 2025 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants