-
Notifications
You must be signed in to change notification settings - Fork 7k
[doc build] use rayci.anyscale.dev to fetch doc build cache #57877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the documentation build cache fetching mechanism to use a URL endpoint instead of direct S3 access via boto3. This is a good simplification. My review includes two main points: improving an error message to be more accurate and robust, and fixing a potential performance issue in the file download logic by streaming the response to avoid high memory usage. Overall, the changes are in the right direction.
doc/load_doc_cache.py
Outdated
| except botocore.exceptions.ClientError as e: | ||
| print(f"Failed to download {s3_file_path} from S3: {str(e)}") | ||
| raise e | ||
| with requests.get(f"{DOC_BUILD_S3_URL}/{commit}.tgz", allow_redirects=True) as response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When downloading files with requests, it's important to use the stream=True parameter in the get request. Without it, the entire file content is loaded into memory at once. For large cache files, this can lead to high memory consumption and potential performance issues. By using stream=True, the response content is streamed, and iter_content can process it in chunks, which is much more memory-efficient.
| with requests.get(f"{DOC_BUILD_S3_URL}/{commit}.tgz", allow_redirects=True) as response: | |
| with requests.get(f"{DOC_BUILD_S3_URL}/{commit}.tgz", allow_redirects=True, stream=True) as response: |
5991c65 to
8a51261
Compare
8a51261 to
0a58e0a
Compare
0a58e0a to
9dae4be
Compare
stop fetching from s3 directly Signed-off-by: Lonnie Liu <[email protected]>
4009dbc to
c8b8085
Compare
…ect#57877) so that we are not tied to using public s3 buckets Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: xgui <[email protected]>
so that we are not tied to using public s3 buckets Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: elliot-barn <[email protected]>
…ect#57877) so that we are not tied to using public s3 buckets Signed-off-by: Lonnie Liu <[email protected]>
…ect#57877) so that we are not tied to using public s3 buckets Signed-off-by: Lonnie Liu <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
so that we are not tied to using public s3 buckets