Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when an release asset doesn't exist #130

Conversation

einsteinx2
Copy link
Contributor

Currently, the script crashes whenever a release asset is unable to download (for example a 404 response). This change instead logs the failure and allows the script to continue. No retry logic is enabled, but at least it prevents the crash and allows the backup to complete. Retry logic can be implemented later if wanted.

closes #129

Original result when asset fails to download:

Cloning reicast-emulator repository from https://*****:[email protected]/reicast/reicast-emulator.git to /Users/bbaron/tmp/github-backup/starred/reicast/reicast-emulator/repository
Retrieving reicast/reicast-emulator issues
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues?per_page=100&page=1&filter=all&state=open&since=2020-01-03T22%3A21%3A33Z
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues?per_page=100&page=1&filter=all&state=closed&since=2020-01-03T22%3A21%3A33Z
Saving 0 issues to disk
Retrieving reicast/reicast-emulator pull requests
Requesting https://api.github.com/repos/reicast/reicast-emulator/pulls?per_page=100&page=1&filter=all&state=open&sort=updated&direction=desc
Requesting https://api.github.com/repos/reicast/reicast-emulator/pulls?per_page=100&page=1&filter=all&state=closed&sort=updated&direction=desc
Saving 0 pull requests to disk
Retrieving reicast/reicast-emulator milestones
Requesting https://api.github.com/repos/reicast/reicast-emulator/milestones?per_page=100&page=1&state=all
Saving 9 milestones to disk
Retrieving einsteinx2 labels
Requesting https://api.github.com/repos/reicast/reicast-emulator/labels?per_page=100&page=1
Writing 64 labels to disk
Retrieving reicast/reicast-emulator releases
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases?per_page=100&page=1
Saving 7 releases to disk
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18970705/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18607180/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18598619/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/12856493/assets?per_page=100&page=1
Traceback (most recent call last):
  File "/Users/bbaron/.platformio/penv/bin/github-backup", line 1117, in <module>
    main()
  File "/Users/bbaron/.platformio/penv/bin/github-backup", line 1112, in main
    backup_repositories(args, output_directory, repositories)
  File "/Users/bbaron/.platformio/penv/bin/github-backup", line 752, in backup_repositories
    include_assets=args.include_assets or args.include_everything)
  File "/Users/bbaron/.platformio/penv/bin/github-backup", line 962, in backup_releases
    download_file(asset['url'], os.path.join(release_cwd, asset['name']), get_auth(args))
  File "/Users/bbaron/.platformio/penv/bin/github-backup", line 572, in download_file
    response = opener.open(request)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Output with this change implemented:

Updating reicast-emulator in /Users/bbaron/tmp/github-backup/starred/reicast/reicast-emulator/repository
Retrieving reicast/reicast-emulator issues
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues?per_page=100&page=1&filter=all&state=open&since=2020-01-03T22%3A21%3A33Z
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues?per_page=100&page=1&filter=all&state=closed&since=2020-01-03T22%3A21%3A33Z
Saving 2 issues to disk
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues/1749/comments?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues/1749/events?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues/1716/comments?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/issues/1716/events?per_page=100&page=1
Retrieving reicast/reicast-emulator pull requests
Requesting https://api.github.com/repos/reicast/reicast-emulator/pulls?per_page=100&page=1&filter=all&state=open&sort=updated&direction=desc
Requesting https://api.github.com/repos/reicast/reicast-emulator/pulls?per_page=100&page=1&filter=all&state=closed&sort=updated&direction=desc
Saving 0 pull requests to disk
Retrieving reicast/reicast-emulator milestones
Requesting https://api.github.com/repos/reicast/reicast-emulator/milestones?per_page=100&page=1&state=all
Saving 9 milestones to disk
Retrieving einsteinx2 labels
Requesting https://api.github.com/repos/reicast/reicast-emulator/labels?per_page=100&page=1
Writing 64 labels to disk
Retrieving reicast/reicast-emulator releases
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases?per_page=100&page=1
Saving 7 releases to disk
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18970705/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18607180/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/18598619/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/12856493/assets?per_page=100&page=1
Skipping download of asset https://api.github.com/repos/reicast/reicast-emulator/releases/assets/8640991 due to HTTPError: Not Found
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/12254973/assets?per_page=100&page=1
Skipping download of asset https://api.github.com/repos/reicast/reicast-emulator/releases/assets/8128776 due to HTTPError: Not Found
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/773524/assets?per_page=100&page=1
Requesting https://api.github.com/repos/reicast/reicast-emulator/releases/537488/assets?per_page=100&page=1
Skipping download of asset https://api.github.com/repos/reicast/reicast-emulator/releases/assets/229553 due to HTTPError: Not Found

Currently, the script crashes whenever a release asset is unable to download (for example a 404 response). This change instead logs the failure and allows the script to continue. No retry logic is enabled, but at least it prevents the crash and allows the backup to complete. Retry logic can be implemented later if wanted.

closes josegonzalez#129
Copy link
Contributor

@whwright whwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should use the same pattern as _get_response for error handles/should continue? Personally, I think these should be logged to stderr using log_warning with messaging along the lines of "Failed to download asset... skipping...".

but you should get an opinion from @josegonzalez

bin/github-backup Outdated Show resolved Hide resolved
@josegonzalez
Copy link
Owner

Use log_warning. Please ping me when this gets updated since a git push won't update me (I use github notifications to track issues and unfortunately, I have a ton of repositories...)

@einsteinx2
Copy link
Contributor Author

@josegonzalez Thank's for being so responsive with these PRs! I've made the requested change.

bin/github-backup Outdated Show resolved Hide resolved
@einsteinx2
Copy link
Contributor Author

@whwright I agree. I'm pretty new to Python and also wanted to keep my changes minimal, so I didn't go that route, but that does seem like a better idea. If @josegonzalez wants, I can take a stab at creating that function.

@josegonzalez
Copy link
Owner

@einsteinx2 go for it, its OSS, so we can take some time to level up your python while we're here :)

@einsteinx2
Copy link
Contributor Author

Sounds good, I'll give it a shot then :)

I really like this script, and as I've been using it and reading the other open issues I keep finding other little things I want to improve, plus it's been fun getting up to speed on Python, so I expect to be sending more PRs your way after I finish this bit ;)

@einsteinx2
Copy link
Contributor Author

Hmm so as I'm diving into this I noticed that the _get_response and associated functions seem to need some refactoring.

For example, it looks like they were meant to be written to collect errors during the retry process, but that's not how they're being used in practice.

For example _request_http_error takes in an errors list which is also returns, but it never actually adds any errors to it. Then in _get_response, there is also an errors array (which is passes to _request_http_error presumably to collect any errors) and also returns, but it never gets added to there either.

Then _request_url_error (which is called whenever a URLError or socket.error happens in _get_response) calls log_error which immediately exits the script before it can return. Then if it could return, _get_response would immediately raise an exception which is not caught, so the script would die before any error handling could happen anyway.

I started refactoring the asset downloading to share those same error handling functions, but then those needed refactoring, which meant retrieve_data_gen needed refactoring (it seems the real error handling is actually happening there right now in the cases the script doesn't die first).

I'm going to put this aside for the moment so I can look at it with a fresh head later and work on one of the other simpler tickets I posted for the time being then come back to this.

@einsteinx2
Copy link
Contributor Author

@josegonzalez Given the scope of the changes necessary to change the error handling and the fact that this PR has already been tested to fix the issue, I'm thinking maybe it would be better to merge this in now as-is and then I can make a new issue to refactor the error handling as that's really a separate task unrelated to this bug fix.

@einsteinx2
Copy link
Contributor Author

Created a new issue specifically for refactoring the error handling code here: #138

As of right now, I can't use the script as-is for my personal backups until this PR is merged without using a custom branch including these changes, which makes it harder to work on the script as I have to keep two copies around, one with these changes and then the main repo and constantly cherry-pick in all the other new changes to my custom branch. It would be easier if this was part of master and then I can just improve it from there with another PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when an asset doesn't exist (returns 404 response)
3 participants