Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max upload size is 5GB #165

Open
jsh2134 opened this issue Apr 27, 2017 · 4 comments
Open

Max upload size is 5GB #165

jsh2134 opened this issue Apr 27, 2017 · 4 comments

Comments

@jsh2134
Copy link
Contributor

jsh2134 commented Apr 27, 2017

We currently upload files directly to AWS. Incorporate multi-part uploads to upload directly, or add easier way to include download URLs in a manifest (manifest.add_url('https//..'))

from AWS docs
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html

The largest object that can be uploaded in a single PUT is 5 gigabytes.
https://aws.amazon.com/s3/faqs/ (from AWS)

@jsh2134
Copy link
Contributor Author

jsh2134 commented Apr 27, 2017

manifest.add_url(url) workaround

something like

filename = "gnomad.exomes.r2.0.1.sites.vcf.gz"
url = "https://storage.googleapis.com/gnomad-public/release-170228/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz"
manifest = solvebio.Manifest()
manifest.manifest['files'] = [{'name': filename, 'url': url}]
imp = solvebio.DatasetImport.create(dataset_id=d.id,manifest=manifest.manifest, auto_approve=True)

@davecap
Copy link
Member

davecap commented Jun 6, 2017

Maybe use the multi-part upload functions from https://github.com/requests/toolbelt

@davecap davecap modified the milestone: Version 2.0.0 Jun 6, 2017
@jsh2134 jsh2134 removed this from the Version 2.0.0 milestone Jul 28, 2017
@jsh2134
Copy link
Contributor Author

jsh2134 commented Mar 30, 2018

This is still the limit FYI.

Will get an error like this

Traceback (most recent call last):
  File "/usr/local/bin/solvebio", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/solvebio/cli/main.py", line 309, in main
    return args.func(args)
  File "/usr/local/lib/python2.7/site-packages/solvebio/cli/data.py", line 216, in upload
    vault.full_path)
  File "/usr/local/lib/python2.7/site-packages/solvebio/resource/object.py", line 226, in upload_file
    headers=headers)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 126, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(32, 'Broken pipe'))

Look into ways to intercept this with a useful error message.

@davecap davecap added P2:normal and removed P1:high labels Feb 19, 2019
@davecap
Copy link
Member

davecap commented Feb 19, 2019

The current workaround is to send manifests with URLs to the files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants