Skip to content

retry when transient HTTP errors occur#463

Merged
Jason2866 merged 2 commits into
developfrom
patch_downloader
Apr 12, 2026
Merged

retry when transient HTTP errors occur#463
Jason2866 merged 2 commits into
developfrom
patch_downloader

Conversation

@Jason2866

@Jason2866 Jason2866 commented Apr 12, 2026

Copy link
Copy Markdown

Checklist:

  • The pull request is done against the latest develop branch
  • Only relevant files were touched
  • Only one feature/fix was added per PR, more changes are allowed when changing boards.json
  • I accept the CLA

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced package download resilience with automatic retry logic and exponential backoff when encountering temporary failures.

@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'retry when transient HTTP errors occur' directly addresses the main change: implementing retry logic for transient HTTP errors during package downloads and framework metadata fetches.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch patch_downloader

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Jason2866

Jason2866 commented Apr 12, 2026

Copy link
Copy Markdown
Author

@coderabbitai analyze if the PR can solve Github transient errors 502 and 503 when trying to download

@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the PR details and changed files
gh pr view 463 --repo pioarduino/platform-espressif32 --json files,body,additions,deletions

# Get the diff of the PR
gh pr diff 463 --repo pioarduino/platform-espressif32

Length of output: 4628


🌐 Web query:

PlatformIO FileDownloader PackageException HTTP 502 503 error handling source code

💡 Result:

In PlatformIO Core, HTTP 502/503 during package downloads is handled in FileDownloader by immediately raising PackageException (no special-case retry/backoff there).

Where it happens (source)

  • platformio/package/download.pyclass FileDownloader.__init__:

    • Creates an HTTPSession() and does get(url, stream=True)
    • If status_code not in (200, 203) it raises PackageException("Got the unrecognized status code ...") — so 502/503 become a PackageException right here. [1]
  • platformio/package/exception.py defines class PackageException(UserSideException) (no extra logic; it’s just the exception type being thrown). [2]

Related: where PlatformIO does configure retries for 502/503

  • platformio/http.pyHTTPSessionIterator configures urllib3.util.retry.Retry(...) with status_forcelist=[413, 500, 502, 503, 504] and total=5, backoff_factor=1, but that retry setup is used by HTTPClient sessions, not by FileDownloader (which directly uses a plain HTTPSession() without mounting an adapter). [3]

Sources:
[1] platformio/package/download.py (FileDownloader status-code check and PackageException) (raw.githubusercontent.com)
[2] platformio/package/exception.py (PackageException definition) (raw.githubusercontent.com)
[3] platformio/http.py (Retry config incl. 502/503 in status_forcelist) (raw.githubusercontent.com)

Citations:


@Jason2866: Here is a detailed analysis of whether this PR effectively addresses GitHub transient 502/503 errors during downloads.


Analysis: Retry Logic for Transient HTTP 502/503 Errors

Root Cause Context

PlatformIO's FileDownloader.__init__ raises PackageException("Got the unrecognized status code ...") for any HTTP status not in (200, 203) — so 502/503 responses become a PackageException immediately, with no built-in retry.

Interestingly, PlatformIO's http.py does configure urllib3.util.retry.Retry with status_forcelist=[413, 500, 502, 503, 504] and total=5, but this retry adapter is used by HTTPClient sessions, not by FileDownloader — which is the exact gap this PR fills.


What the PR Does

1. patch_file_downloader() — Monkey-patch for package downloads

This wraps FileDownloader.__init__ with a retry loop that:

  • Catches PackageException (which is exactly what 502/503 raises in FileDownloader)
  • Retries up to 5 times with exponential backoff: 2s → 4s → 8s → 16s → 32s (≈62s max wait)
  • Includes a guard (_patched flag) to prevent double-patching
  • Attempts cleanup of _http_response/_http_session with hasattr guards to avoid AttributeError on partially initialized state

2. Retry loop in _configure_arduino_framework()

Adds retry logic for fetching the Arduino framework metadata JSON with the same 5-attempt exponential backoff, catching requests.RequestException (which includes HTTPError raised by raise_for_status() for 502/503).


Will It Solve Transient 502/503 Errors?

Scenario Before PR After PR
FileDownloader hits GitHub 502/503 Fails immediately Retried up to 5× with backoff ✅
Arduino framework JSON fetch hits 502/503 Fails immediately Retried up to 5× with backoff ✅
Connection timeout during download Fails immediately Retried (caught by PackageException) ✅

Minor Caveats

  1. Broad exception catch: PackageException is caught for all errors, not just 502/503. This means non-transient failures (e.g., a 404 Not Found) will also be retried 5 times, wasting ~62 seconds before giving up. It won't cause harm, but filtering specifically for 502/503 status codes would be more precise.

  2. Re-initializing a partial object: original_init(self, ...) is called again on an already partially constructed self object. For this use case it should be safe, but is worth noting.

  3. Missing requests.HTTPError status filtering: In _configure_arduino_framework(), it could check e.response.status_code in (502, 503) before retrying to avoid retrying permanent errors, though again the practical impact is minimal.


Verdict

Yes, this PR should effectively solve GitHub transient 502/503 errors for both package downloads and framework metadata fetches. The monkey-patch approach is a pragmatic and well-structured workaround for the gap in PlatformIO's own retry infrastructure specifically affecting FileDownloader.

@Jason2866

Copy link
Copy Markdown
Author

@coderabbitai fetching 404 errors is good. Github answers sometimes with wrong codes. There are no 404 errors since the packages are verified before to be available.

@coderabbitai

coderabbitai Bot commented Apr 12, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@Jason2866: That's a great clarification — thank you! Given that GitHub is known to occasionally return incorrect status codes (including 404) as transient errors, and that packages are pre-verified to exist before download attempts, the broad PackageException catch covering all HTTP errors is not only acceptable but actually the correct and robust approach here. Filtering by specific status codes (like 502/503 only) could result in missed retries for transient GitHub misbehavior.

This fully addresses the "minor caveat" I raised in the previous analysis. The PR's retry strategy is sound and well-suited to GitHub's real-world behavior. ✅


✏️ Learnings added
Learnt from: Jason2866
URL: https://github.com/pioarduino/platform-espressif32/pull/463

Timestamp: 2026-04-12T20:24:06.280Z
Learning: In the pioarduino/platform-espressif32 project, retrying on all HTTP error status codes (including 404) during package downloads is intentional. GitHub sometimes returns incorrect/misleading status codes (including 404) as transient errors. Packages are verified to be available before download is attempted, so real 404 (not found) errors do not occur in practice. The broad `PackageException` catch in the retry logic (e.g., in `patch_file_downloader()`) should not be narrowed to specific status codes like 502/503 only.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: Jason2866
Repo: pioarduino/platform-espressif32 PR: 268
File: builder/penv_setup.py:87-96
Timestamp: 2025-08-26T14:52:29.044Z
Learning: For the pioarduino/platform-espressif32 project, Jason2866 prefers to trust system executables when falling back to PATH-based command resolution (like "uv"), rather than implementing additional validation checks for security concerns.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@platform.py`:
- Line 145: The docstring for PlatformIO's FileDownloader in platform.py
incorrectly states retries are limited to 502/503, while the implementation
retries all PackageException HTTP errors; update that docstring to describe the
actual behavior (i.e., retries on any PackageException / all HTTP error status
codes, including 404) and note that this broad retry behavior is intentional for
package downloads so future maintainers don't regress it; reference the
FileDownloader implementation and the retry logic that raises/handles
PackageException when composing the new docstring.
- Around line 621-626: The code calls response.json() to parse packjdata but
doesn't handle JSON decode errors, so add ValueError to the exceptions caught in
the retry block so malformed/non-JSON bodies trigger the same retry logic;
update the except clause that currently catches (requests.RequestException,
KeyError, IndexError) to also include ValueError and keep the existing retry
behavior around packjdata = response.json(), dyn_lib_url and the assignment to
self.packages["framework-arduinoespressif32-libs"]["version"], using the same
attempt/max_retries flow.
- Around line 172-173: The silent catch "except Exception: pass" in platform.py
should be replaced with explicit handling: change it to "except Exception as e:"
and log the exception (e.g., logger.debug or logger.exception with
exc_info=True) including contextual info about the retry/cleanup operation so
diagnostics are preserved; if the cleanup can fail fatally, either re-raise the
exception or narrow the catch to specific expected exception types instead of a
blanket Exception to avoid hiding real errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02c60c58-038a-4de5-addf-0ecb76e2e162

📥 Commits

Reviewing files that changed from the base of the PR and between 3b69e77 and 55e05bc.

📒 Files selected for processing (1)
  • platform.py

Comment thread platform.py Outdated
Comment thread platform.py Outdated
Comment thread platform.py
Updated exception handling in patch_file_downloader to include ValueError. Modified docstring to clarify retry on transient HTTP errors.
@Jason2866 Jason2866 changed the title retry when transient HTTP errors 502/503 occur retry when transient HTTP errors occur Apr 12, 2026
@Jason2866 Jason2866 merged commit 0541226 into develop Apr 12, 2026
1 check passed
@Jason2866 Jason2866 deleted the patch_downloader branch April 12, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant