Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replaced httr one-shot functions with httr::RETRY() #148

Closed
wants to merge 1 commit into from
Closed

replaced httr one-shot functions with httr::RETRY() #148

wants to merge 1 commit into from

Conversation

jameslamb
Copy link
Contributor

Thanks for for this awesome project!

In this PR, I'd like to propose swapping out calls to httr::POST(), httr::GET() etc. with httr::RETRY(). This will make the package more resilient to transient problems like brief network outages or periods where the service(s) it hits are overwhelmed. In my experience, using retry logic almost always improves the user experience with HTTP clients.

I'm working on chircollab/chircollab20#1 as part of Chicago R Collab, an R 'unconference' in Chicago.

@codecov-io
Copy link

Codecov Report

Merging #148 into master will decrease coverage by 53.64%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #148       +/-   ##
==========================================
- Coverage   54.93%   1.29%   -53.65%     
==========================================
  Files          24      24               
  Lines        1509    1544       +35     
==========================================
- Hits          829      20      -809     
- Misses        680    1524      +844     
Impacted Files Coverage Δ
R/boxr__internal_get.R 0.00% <0.00%> (-81.64%) ⬇️
R/boxr__internal_misc.R 13.04% <0.00%> (-35.18%) ⬇️
R/boxr__internal_update_upload.R 0.00% <0.00%> (-100.00%) ⬇️
R/boxr_add_description.R 0.00% <0.00%> (ø)
R/boxr_delete_restore.R 0.00% <0.00%> (-55.00%) ⬇️
R/boxr_file_versions.R 0.00% <0.00%> (-91.31%) ⬇️
R/boxr_invite.R 0.00% <0.00%> (ø)
R/boxr_misc.R 0.00% <0.00%> (-43.40%) ⬇️
R/boxr_s3_classes.R 0.00% <0.00%> (-14.12%) ⬇️
R/boxr_search.R 0.00% <0.00%> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa495cd...645e6d7. Read the comment docs.

@nathancday
Copy link
Member

Lovely! Thanks for helping this project and adding the perseverance of RETRY to all of the API requests.

Why do you choose to use 403 and 404 for terminate_on?

My thinking would be to add other response codes to that list, based on this list of Common Box API errors, but I want to understand your design decision before I do.

@jameslamb
Copy link
Contributor Author

jameslamb commented Apr 20, 2020

Why do you choose to use 403 and 404 for terminate_on?

My thinking would be to add other response codes to that list, based on this list of Common Box API errors, but I want to understand your design decision before I do.

I don't know anything about the Box API specifically, just using httr in general. I think 403 and 404 are a reasonable starting point if you know nothing about a service, since those typically have an unambiguous meaning.

Just in case you aren't familiar with httr::RETRY(), I want to be sure to explain...terminate_on is a tiny optimization.

RETRY(..., terminate_oon = c(403, 404)) means this:

  • if you get a status code < 300 --> return response immediately
  • if you get a status code of exactly 403 or 404 --> return response immediately
  • if you get a status code >= 300 EXCEPT 403 and 404 --> retry times times (default is 3 retries)

If you'd like me to change the terminate_on list in this PR based on your knowledge of the Box API, let me know and I'd be happy to!

@nathancday
Copy link
Member

No I don't think you need to do anything else, this is already great. I'm going to do some manual test today to see what error messages Box sends get back for the common scenarios (files don't exist, users aren't don't have access) and I can add the other response codes as needed.

Thanks for doing this again, I see you all have been busy helping a lot of packages. Stuff like this makes me so so happy to be an R user.

@ijlyttle
Copy link
Member

Just to echo everything @nathancday said: thanks @jameslamb and @chircollab, this is wonderful!

@nathancday
Copy link
Member

nathancday commented Apr 23, 2020

Hey @ijlyttle I've been exploring the different API status codes and I think we should add some of them toterminate_on = , because thet represent errors that are real and unrelated to connectivity. What do you think the best way to manage this is list would be?

My first idea was an internal functions like this:

box_http_codes <- function() {
  # https://developer.box.com/guides/api-calls/permissions-and-errors/common-errors/ 
  c(
    400, # Bad request
    401, # Unauthorized
    403, # Forbidden
    404, # Not found
    405, # Method not allowed
    410, # Gone
    411, # Length required
    412, # Precondition failed
    413, # Request entity too large
    415  # Unsupported media type
  )

Which would then get used like this:

req <- httr::RETRY(
      "GET",
      paste0(
        "https://api.box.com/2.0/files/",
        file_id, "/content", "?version=", version_id
      ),
      if (download)
        httr::write_disk(local_file, TRUE),
      if (pb)
        httr::progress(),
      get_token(),
      terminate_on = box_http_codes()
    )

As a way to prevent managing multiple repeats.

@ijlyttle
Copy link
Member

I like the idea!

I don't know that box_http_codes() needs to be a function, could it be an internal object? Perhaps called box_terminal_codes (or something else that denotes "bad")?

This might also be an opportunity to refactor using glue. Perhaps as we touch code that uses paste0(), we could update to glue::glue()?

For example:

paste0(
  "https://api.box.com/2.0/files/",
  file_id, "/content", "?version=", version_id
)

could become:

glue::glue("https://api.box.com/2.0/files/{file_id}/content?version={version_id}")

Just a thought...

@jameslamb
Copy link
Contributor Author

@ijlyttle a function is a better choice because functions are lazily evaluated in R. If you make something a vector (for example), the code to create that vector will get run when the package is loaded. Package source files (in R/) are loaded in alphabetical order unless you use Colatte: in DESCRIPTION (like this) but that is a pain to manage.

I like the internal-function-wrapping-a-constant pattern! We do this in LightGBM, for example, so we never have to think about the order that .R files are sourced.

@nathancday
Copy link
Member

Thanks y'all!
@ijlyttle I agree about renaming the function with terminal, let's use box_terminal_http_codes() to be explicit. I know we've made a glue refactor part of the roadmap, but I'd like to keep that separate from this. Happy to start that PR now...

@jameslamb thanks again for doing this! I'm going to close this PR and merge in the one with the additional commit for the internal function strategy.

@ijlyttle
Copy link
Member

ijlyttle commented Apr 27, 2020

@jameslamb I had wondered why I had seen that pattern around, but it never clicked until your explanation, thanks!

@nathancday Sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants