Skip to content

JENKINS-56740: Add multipart upload support for big files #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bozaro
Copy link

@bozaro bozaro commented Mar 25, 2019

Proof-of-concept implementation.
I tried to implement the upload of large files. The code turned out so-so, but it works.

JENKINS-56740

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some test failures and FindBugs errors. Sounds like this is a proof of concept but needs more work to become mergeable.

@NonNull
public abstract String getContainer();

/** A constant to define whether we should delete artifacts or leave them to be managed on the blob service side. */
/**
* A constant to define whether we should delete artifacts or leave them to be managed on the blob service side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please minimize diffs when filing a PR—do not edit unrelated lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* @return the URL
* @throws IOException
*/
@NonNull
public abstract URL toExternalURL(@NonNull Blob blob, @NonNull HttpMethod httpMethod) throws IOException;

@NonNull
public abstract MultipartUploader initiateMultipartUpload(@NonNull Blob blob) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not introduce new abstract methods.

Copy link
Author

@bozaro bozaro Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I have not thought of anything better yet.

Uploading artifacts logic clearly does not fit into the existing interface. For multipart upload we need:

  • check file size (worker), because we shoud known parts count and uploading time (multipart allowed only for files >= 5Mb);
  • generate urls for everty part (master);
  • initiate uploading to S3 (master);
  • upload every part to S3 and store etag (worker);
  • complete uploading to S3 (master).

Also we would like to make reduce "master <-> worker" roundtrips as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more beautiful if I could request the URL to upload from worker, but:

  • I don't know how to do it;
  • it looks like as security hole.

BlobStore blobStore = getContext().getBlobStore();
if (artifacts.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions would this clause be hit? Normally the caller should skip and warn.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove this condition.

try {
uploader.close();
} catch (Exception e) {
listener.getLogger().printf("Can't abort multipart upload: %s", e.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually would use Functions.printStackTrace or similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, how to use Functions.printStackTrace.
This error is not fatal and usually can be silently ignored. As I known, uncompleted multipart uploads should be removed by S3 bucket lifecycle policy.

import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid rearranging imports, and avoid wildcards. (Yes we should probably set up an import style enforcer or something.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@bozaro bozaro force-pushed the upload-big-files branch from 77ebf73 to 9438ea6 Compare March 27, 2019 20:02
@bozaro
Copy link
Author

bozaro commented Mar 27, 2019

Most of the errors are caused by the lack of implementation in the mock class.
I am ready to fix the mock class, but before that I would like to decide on the rest of the code.

I got the impression that, from the Jenkins point of view, this whole structure is extremely unnatural. Because of this, it turns out some freak.

@nigelarmstrong-bc
Copy link

Would it be possible to revive this PR? This is important to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants