Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

Stream files to storage backend chunk by chunk #304

Merged
merged 7 commits into from
Aug 23, 2022

Conversation

ddelange
Copy link
Contributor

@ddelange ddelange commented Jun 21, 2022

ref #302 (comment)

builds upon #302, #303

fixes #210

depends on piskvorky/smart_open#702, getmoto/moto#5255

API:

  • added self.get_uri to s3,gcs,azure storages. for the rest no changes

Functionality:

  • does not need to load the whole package into memory to perform request.db.upload (will now do multi-part uploads)
  • does not need to generate signed urls to perform a streamed response (rather re-uses the storage client) when stream_files is true
  • does not need to load a whole package into memory when generate_hashes is true

CI:

  • tests use snakepacker/python:all as base image
  • moto PR was done to support ACL
  • azure mock (vcrpy) replaced with azurite server
  • gcs mock expanded for multipart upload

@ddelange ddelange changed the title Multipart upload Stream files to storage backend chunk by chunk Jun 21, 2022
@ddelange ddelange force-pushed the multipart_upload branch 12 times, most recently from 4a62ff5 to fea86fa Compare June 21, 2022 19:03
pypicloud/storage/gcs.py Outdated Show resolved Hide resolved
@ddelange ddelange force-pushed the multipart_upload branch 2 times, most recently from e436d98 to b2ab1f2 Compare June 21, 2022 19:27
Copy link
Owner

@stevearc stevearc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's currently in a bad rebase state. Sending back to you

@ddelange
Copy link
Contributor Author

mocking multipart gcs still needs some work https://github.com/stevearc/pypicloud/runs/6999063103?check_suite_focus=true#step:4:2385

as well as mocking s3 ref getmoto/moto#5252

@ddelange ddelange marked this pull request as draft June 22, 2022 09:32
@ddelange ddelange force-pushed the multipart_upload branch 3 times, most recently from 4969dae to dd047ce Compare June 23, 2022 09:57
@coveralls
Copy link

coveralls commented Jul 10, 2022

Coverage Status

Coverage increased (+0.04%) to 82.916% when pulling 6edcc42 on ddelange:multipart_upload into 523e0c5 on stevearc:master.

RUN curl https://raw.githubusercontent.com/fkrull/docker-multi-python/master/setup.sh -o /setup.sh \
&& bash setup.sh \
&& rm /setup.sh
RUN apt-install libldap2-dev libsasl2-dev default-jre
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref https://github.com/snakepacker/python#apt-install

this image contains git, tox, all the python versions etc. shaves off 1+ minute from CI for each env

setup.py Outdated
@@ -13,7 +13,7 @@

REQUIREMENTS_TEST = open(os.path.join(HERE, "requirements_test.txt")).readlines()
REQUIREMENTS = [
"smart_open[http]",
"smart_open[s3,http]@git+https://github.com/ddelange/smart_open@patch-1",
Copy link
Contributor Author

@ddelange ddelange Jul 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevearc I'll update the PR once this merges, until then any PR comments are appreciated :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I start a fresh pod running stevearc/pypicloud:1.3.6, k8s still registers 141Mi. potentially a memprofiling candidate 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memprofiling using filprofiler

a default server.ini generated using local filesystem option, killed once the server booted succesfully:

fil-profile run -m pyramid.scripts.pserve server.ini

https://user-images.githubusercontent.com/14880945/178993376-ef2c9111-33fa-4dda-b27e-956fcbe1ccb0.svg

@ddelange ddelange marked this pull request as ready for review July 10, 2022 10:14
@ddelange ddelange requested a review from stevearc July 10, 2022 10:14
@ddelange ddelange force-pushed the multipart_upload branch 7 times, most recently from d6a3b3f to 7a511df Compare July 10, 2022 12:35
@stevearc
Copy link
Owner

Sorry for the delay, will do a review soon.

Copy link
Owner

@stevearc stevearc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, otherwise looks good. We're just waiting on the smart_open PR to merge?

pypicloud/storage/azure_blob.py Outdated Show resolved Hide resolved
tests/test_storage.py Show resolved Hide resolved
@ddelange
Copy link
Contributor Author

@stevearc I think this is ready to ship :)
bike front flip

@ddelange
Copy link
Contributor Author

if this looks good to you, could you puah the docker image also for arm64 ref stevearc/pypicloud-docker#37?

@stevearc
Copy link
Owner

Sick! I'll release shortly

@stevearc stevearc merged commit 37b7666 into stevearc:master Aug 23, 2022
@stevearc
Copy link
Owner

Release is cut, but the docker build and publish is broken. Will debug when I get time

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching large packages craches server
4 participants