-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful handling of errors in vectorised inputs #55
Comments
Did you try it with sending in the column of text as is? It is vectorised so should cope with it, and a results <- gl_nlp(df_filtered$review_text) |
Thanks Mark for the quick response. I tried it, but upon getting an error, it exits rather than proceeding gracefully. I did manage to get it working though, by using only rows with 20 words in them, and converting all text to UTF 8. Though it was a fiddly process. Let me know if I should close this comment, or you'd like to know more.
|
Ok good to know thanks - I will keep issue open to make the fails more graceful. |
Many Thanks Mark. Let me know if you need me to test anything in the future. |
Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure: This code: #Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))
####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)
#Run the data through
tryCatch(
{
for(i in 1:length(ids)){
output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
}
}
) ultimately produces this error: |
Ok I’ll take a look to make this fail more gracefully
…________________________________
From: thisisnickb <[email protected]>
Sent: Tuesday, July 2, 2019 8:19 PM
To: ropensci/googleLanguageR
Cc: Mark; Comment
Subject: Re: [ropensci/googleLanguageR] Graceful handling of errors in vectorised inputs (#55)
Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure:
This code:
________________________________
#Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))
####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)
#Run the data through
tryCatch(
{
for(i in 1:length(ids)){
output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
}
}
)
________________________________
ultimately produces this error:
[unnamed]<https://user-images.githubusercontent.com/35079605/60536503-64d21480-9cd4-11e9-9732-c63231a911ff.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#55?email_source=notifications&email_token=AAYCPLHTF5JAGTQKTO743GDP5OL3PA5CNFSM4G2QP762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZCEKTY#issuecomment-507790671>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAYCPLBIIYG2HVZITCYEUNDP5OL3PANCNFSM4G2QP76Q>.
|
The above scenarios should be better now in version 0.2.0.9000 on Github now (install via For example, the below calls will carry on if there are 400 errors in the first responses: library(googleLanguageR)
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA))
2019-07-02 22:08:00 -- annotateText: 43 characters
2019-07-02 22:08:01> Request Status Code: 400
2019-07-02 22:08:01 -- Error processing string: 'the rain in spain falls mainly on the plain' API returned: Invalid text content: too few tokens (words) to process.
2019-07-02 22:08:01 -- annotateText: 3 characters
2019-07-02 22:08:02> Request Status Code: 400
2019-07-02 22:08:02 -- Error processing string: 'err' API returned: Invalid text content: too few tokens (words) to process. Which gives a response like below:
Note you do not need to loop through indexes etc. to pass multiple text to the API, send in the vector and it will do one API call per text element. It will skip API calls for empty strings or NA vector elements. |
Fixed - many thanks! |
One thing I have just realised, is that the "too few tokens (words) to process." error only occurs if you include classifyText in the request e.g. if you use the annotateText default that includes all methods. You can get entity analysis for any number of characters if you specify only that e.g. gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA), nlp_type = "analyzeEntities") See https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText |
I am trying to loop through a dataframe with reviews. Some of which seem to be below the detection threshold. I get errors like
When I try to do something like this:
I get errors like:
I was thinking why should the whole loop error out due to only 1 bad call. I am using the safely function from purrr. But is there a best practise guide for dealing with these situations somewhere?
Thanks.
The text was updated successfully, but these errors were encountered: