Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with aggregate function in getExpansion_2 via application to canary age comp expansion #104

Open
brianlangseth-NOAA opened this issue Mar 29, 2023 · 5 comments
Assignees
Labels
priority: low The lowest level priority, i.e., not urgent. status: in progress Currently working on this issue topic: code Related to R code within this package type: refactor Changes to the codebase or documentation that do NOT alter the values returned, e.g., styling.

Comments

@brianlangseth-NOAA
Copy link
Contributor

brianlangseth-NOAA commented Mar 29, 2023

Describe the bug
Im encountering an issue with the getExpansion_2 function when I try to expand age comps for canary rockfish across the entire coast. The specific line where this issue occurs in when trying to figure out the trips that dont have catch values associated with them, line 162-163. When I expand based on state, I dont have this issue.

The reason why this is occurring is that when I try to expand across the coast, all trips with no catch also have NA for Sum_Sampled_Lbs. Thus aggregate(Sum_Sampled_Lbs ~...) does not work. When I expand across each state, there are more trips with no catch, and most of these have values for Sum_Sampled_Lbs.

This appears to be a very case-specific issue. In this case, when expanding across the coast my catches run from 1981 on but i have age comps for 1980. It just so happens to be that the only records with Sum_Sampled_Lbs==NA occur in 1980. When expanding by state, other years are added because not every state starts with catches in 1981, and thus there are some postivie Sum_Sampled_Lbs.

To Reproduce
I cant really reproduce because I would have to share catches. However, a minimally reproducible example is below showing that the formula formulation of aggregate doesn't work when the y ~ value is all NA:

test = data.frame("A" = NA, "B" = rep(c("T","N"),2))
aggregate(A~B, data=test, length) #does not work and gives me the error I get for canary
aggregate(test$A, by = list(test$B), length) #works`

Additional context
I see two solutions to this. First use the less elegant formulation for aggregate, like I do above. Within lines 162-163 this would look like

NoCatch <- stats::aggregate(tows[is.na(tows[, "catch"]), ]$Sum_Sampled_Lbs, 
                            by = list("fishyr" = tows[is.na(tows[, "catch"]), ]$fishyr, 
                                      "stratification" = tows[is.na(tows[, "catch"]), ]$stratification),
                            FUN= length)

Second, use a dplyr group_by formulation.

NoCatch <- tows[is.na(tows[, "catch"]), ] %>%
      dplyr::group_by(fishyr, stratification, ) %>%
      dplyr::summarize("N" = length(Sum_Sampled_Lbs))

I tried these two options and both ran, overcoming my original issue. I ran though length comps output for one column and the results were the same across all three approaches (the original, and then my two choices).

@kellijohnson-NOAA
Copy link
Contributor

Thanks Brian. Would you mind sharing your catch file with me on google drive so I can download it and go through your script that you have where you work up the commercial composition data? I am trying to determine what the outcome should be to the user if the code enters this if statement.

@kellijohnson-NOAA
Copy link
Contributor

More information "The if statement was being triggered for age records from the pacfin bds in years without corresponding catch years for that fleet. For example, there are 1980 age records but the pacfin catch file does not include 1980."

@kellijohnson-NOAA
Copy link
Contributor

Make an error message when trying to expand composition data without corresponding catch.

@brianlangseth-NOAA
Copy link
Contributor Author

@kellijohnson-NOAA An error message would be different than what has been done in the past, which is to post a warning. I haven't paid much attention to those, and in those cases I think the comps are left as they are.

Id be hesitant to stop the script without also providing knowledge on how to resolve the issue. Otherwise, what has previously "worked" now wont and folks would be frustrated. Perhaps something off cycle to resolve?

@chantelwetzel-noaa
Copy link
Contributor

If there are no catches associated with a fleet for a year that has bds to expand, I think this is exactly the type of situation where we would want the code to stop in order to force the user to understand this mismatch. I think many instances it would likely be due to a processing error either in the catches or how the bds data was filtered which should be fixed. If there are truly zero catches for a year with bds data I think the user should understand why and be forced to think about if those data should be used and how they should be expanded.

As you noted above, most people blow by the warnings printed to the screen, and would see this issue not being addressed.

@kellijohnson-NOAA kellijohnson-NOAA self-assigned this May 9, 2023
@kellijohnson-NOAA kellijohnson-NOAA added topic: code Related to R code within this package status: in progress Currently working on this issue type: refactor Changes to the codebase or documentation that do NOT alter the values returned, e.g., styling. priority: low The lowest level priority, i.e., not urgent. labels May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: low The lowest level priority, i.e., not urgent. status: in progress Currently working on this issue topic: code Related to R code within this package type: refactor Changes to the codebase or documentation that do NOT alter the values returned, e.g., styling.
Projects
None yet
Development

No branches or pull requests

3 participants