Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase or adjust scope of default HTTP timeout #1921

Closed
zanieb opened this issue Feb 23, 2024 · 12 comments · Fixed by #3144
Closed

Increase or adjust scope of default HTTP timeout #1921

zanieb opened this issue Feb 23, 2024 · 12 comments · Fixed by #3144
Labels
bug Something isn't working upstream An upstream dependency is involved

Comments

@zanieb
Copy link
Member

zanieb commented Feb 23, 2024

Our current HTTP timeout is not sufficient for some large packages e.g. torch

Increasing the timeout has a downside for people that have network problems while downloading smaller packages, e.g. it will wait longer before surfacing a problem.

What does pip use for a default timeout?

@zanieb zanieb added enhancement New feature or improvement to existing functionality needs-decision Undecided if this should be done labels Feb 23, 2024
@MichaReiser
Copy link
Member

I would expect that a server only sends an HTTP Timeout response if the connection is idle for too long, which makes me wonder if increasing the timeout solves the problem or only hides the root cause.

@zanieb
Copy link
Member Author

zanieb commented Feb 23, 2024

Yeah good point, I wonder if we're setting the wrong timeout? We shouldn't be enforcing this timeout when data transfer is actively occurring.

@zanieb zanieb removed the enhancement New feature or improvement to existing functionality label Feb 23, 2024
@zanieb zanieb changed the title Increase default HTTP timeout Increase or adjust scope of default HTTP timeout Feb 23, 2024
@samypr100
Copy link
Collaborator

samypr100 commented Feb 24, 2024

FWIW the default in PIP is 15 seconds https://github.com/pypa/pip/blob/main/src/pip/_internal/cli/cmdoptions.py#L294, so I'm not sure uv's 30? second default is at fault.

@konstin
Copy link
Member

konstin commented Feb 26, 2024

Could it be that we need to reduce the number of parallel requests? It could also be that they timeout because some cpu intensive task is blocking the main thread, esp. with something large like pytorch.

@zanieb
Copy link
Member Author

zanieb commented Feb 26, 2024

It seems like the next steps are to:

  1. Determine what exactly the timeout we are configuring applies to
  2. Create a simple MRE
  3. Determine what is happening with the request when the error is raised

@zanieb
Copy link
Member Author

zanieb commented Apr 5, 2024

See seanmonstar/reqwest#2237

It looks like we're using a deadline for the full request not a read timeout as desired. We need this functionality to be added upstream.

@zanieb
Copy link
Member Author

zanieb commented Apr 12, 2024

See seanmonstar/reqwest#2241

@zanieb zanieb added bug Something isn't working upstream An upstream dependency is involved and removed needs-decision Undecided if this should be done labels Apr 14, 2024
@mariosasko
Copy link

Hi @zanieb! When can we expect this to be fixed?

huggingface/datasets CI is quite consistently failing because of this issue (I'd assume) as of yesterday (e.g., check the "Install dependencies" step of this CI run).

@charliermarsh
Copy link
Member

@mariosasko - How often are you seeing this? We'll ship the HTTP timeout change soon (this week for sure), but I'm still wondering if there's something else going on here and would use huggingface/datasets to investigate if it's at least somewhat frequent.

@mariosasko
Copy link

mariosasko commented Apr 16, 2024

@charliermarsh It's pretty consistent, e.g., see the first two commit's CI runs in huggingface/datasets#6811 (it's more likely to happen than not based on recent CI runs).

PS: Not sure if this info is valuable, but both Windows and Ubuntu GH runners are susceptible to this issue

@charliermarsh
Copy link
Member

OK thanks. I'm gonna do some testing using that CI workflow.

@charliermarsh
Copy link
Member

So the first issue I ran into is that Windows sometimes fails with pytest missing, because the "Install dependencies" step silently fails:

error: Failed to download: scikit-learn==0.23.1
  Caused by: HTTP status server error (503 Service Unavailable) for url (https://files.pythonhosted.org/packages/7e/e5/888491b7e2c16718a68dfd8498325e8927003410b2d19ba255d875[13](https://github.com/charliermarsh/datasets/actions/runs/8708243419/job/23885167918?pr=1#step:7:14)38a5/scikit_learn-0.23.1-cp38-cp38-win_amd64.whl.metadata)
Resolved 4 packages in 41ms
Downloaded 4 packages in 2.54s
Installed 4 packages in 28ms
 + bleurt==0.0.2 (from git+https://github.com/google-research/bleurt.git@cebe7e6f996b40910cfaa520a63db47807e3bf5c)
 + coval==0.0.1 (from git+https://github.com/ns-moosavi/coval.git@87071a6293dc2e786dcfe2ed78e9369c[17](https://github.com/charliermarsh/datasets/actions/runs/8708243419/job/23885167918?pr=1#step:7:18)e41b3b)
 + math-equivalence==0.0.0 (from git+https://github.com/hendrycks/math.git@357963a7f5501a6c1708cf3f3fb0cdf525642761)
 + unbabel-comet==2.2.2

We can make that not silent by changing the shell configuration (so that it exits as soon as one command fails, rather than proceeding). But I'm not sure why it's failing in the first place. 503 hitting that static endpoint?

charliermarsh added a commit that referenced this issue Apr 19, 2024
…3144)

## Summary

This leverages the new `read_timeout` property, which ensures that (like
pip) our timeout is not applied to the _entire_ request, but rather, to
each individual read operation.

Closes: #1921.

See: #1912.
eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 22, 2024
This reverts commit 00f8d47.

uv has fixed astral-sh/uv#1921, which should resolve
timeout issues with pytorch/nvidia packages.
eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 22, 2024
This reverts commit 00f8d47.

uv has fixed astral-sh/uv#1921, which should resolve
timeout issues with pytorch/nvidia packages.
eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 22, 2024
This reverts commit 00f8d47.

uv has fixed astral-sh/uv#1921, which should resolve
timeout issues with pytorch/nvidia packages.
eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Apr 22, 2024
This reverts commit 00f8d47.

uv has fixed astral-sh/uv#1921, which should resolve
timeout issues with pytorch/nvidia packages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream An upstream dependency is involved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants