-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
input sample size #29
Comments
This has been a long-standing issue but I am thinking about it now because I am currently working on |
I think it should be based on the number of tows performed regardless of sex. This was the approach that I was trying to implement with the "both" or "b" columns in previous functions, but was focusing on when you were creating sex = 3 composition data. In hindsight this should also have extended to the unsexed fish as well. This should greatly simplify the code and improve the input sample size calculation across sexed and unsexed fish. I think many of these issues arose in how the original code was structured which output composition data for all Stock Synthesis specifications (sex = 0, sex = 1, sex = 2, sex = 3) and modifications were made trying to preserve the existing functionality. That was a mistake IMO. I think the situation where someone would want only female or only male compositions is so rare that we should only be outputting sex = 0 and sex = 3 composition data. If someone does want only a single sex composition data they can then augment the sex = 3 themselves. |
Thanks @kellijohnson-NOAA for raising this issue. Ultimately these input samples sizes are all adjusted by a tuning algorithm anyway, which I'm guessing will have a larger impact than this choice, but it's still good to have it logical and keep the code simple. I also like the @chantelwetzel-noaa proposal to remove the sex = 1 or 2 output and just reporting females and males in vectors with sex = 3 and unsexed as sex = 0 (my ideal would be all in one table that users could filter or modify as they wish). I also recognizing that there was lots of debate about this at the black rockfish review. But in this example, wouldn't the sample sizes still add up incorrectly, with 6 tows total? If the user drops unsexed fish from the output the issue goes away. Also for most of our species the tows with unsexed fish are much fewer than those with sexed fish so the impact is small.
|
Thanks for the comments. In your example @iantaylor-NOAA I am proposing
|
@kellijohnson-NOAA got her response in quicker than I could. I think sex should be ignored when counting the number of tows. I think if we calculate sexed vs. unsexed separately we end up calculating a higher input sample size for unsexed vs. sexed fish. This is where it really matters since all of our data weighting methods apply a simple multiplier by fleet and composition type making it really important that you have inadvertently over-weighted one of the sex inputs relative to the other. |
I'm inspired to do some model explorations on this one. I'll try to do some of that this morning but for now I think whatever is easiest for @kellijohnson-NOAA to do in I think the impact of the unsexed fish on the model should be similar to what it would have been if the sex were known for those fish. So if the fraction that's unsexed is small it could be problematic to have the same input sample size for those observations as for the sexed fish. Here's my updated thinking on how I would treat these things in a model (from NOAA-only doc: https://docs.google.com/presentation/d/1RnfNzP6Nlyp_b4W2yZjbmBrvm5jUe0L8OJxO633dH1k/edit#slide=id.g3181828d758_0_2) |
I did a simple experiment with the simple_small model in r4ss where I removed the fishery age comps (to avoid conflict between unsexed lengths and sexed ages), then made the small length bins unsexed as a separate vector with 3 options for sample size: same as large fish (N = 50), removed from the model (essentially N = 0), or with reduced sample size proportional to the actual number of small fish (worked out to be N = 1.75 based on the observed ratio associated with my arbitrary selection of the first 6 length bins as unsexed). Keeping Nsamp equal for both vectors created a big change and the recdevs jumped around a lot because the model thought there was now LOTS of information about the small fish (N = 50 but only a few bins with non-zero samples). Removing the small fish entirely changed things a bit, because selectivity shifted as a result of missing information on the presence of small fish in the sample. Using a reduced input sample size produced results very similar to the "all fish sexed" model. This test is limited in lots of ways and including the fact that the original data were random samples from the multinomial distribution and not from the complex fishery sampling process with autocorrelation among fish within each haul. However, I think that the results make sense and could be applied to our input sample size calculations for PacFIN data. That is, the focus on the number of hauls as a basis for the input sample size makes sense when you're comparing among years. But if you're looking at sexed vs unsexed fish for the same year and fleet, I think the actual number of sampled fish matters more so calculating the input sample size for the unsexed fish could be based based on the number of trips with any fish sampled multiplied by the fraction of sampled fish that were unsexed. If the PacFIN.Utilities code reports Nhauls and Nfish for each vector, the user could do their own calculations along those lines. Or the function could just do it automatically. @kellijohnson-NOAA and @chantelwetzel-noaa let me know if any of this makes sense and/or you want further explorations. Code to run the experiment is in testing_impact_of_small_unsexed_fish.R.txt |
As the default and hidden deep within the code, I do not think that I want to substitute one complicated method for another. So, I am resorting to returning the number of tows across all sexed and unsexed fish and then the number of fish per sex (i.e., n_fish_U, n_fish_F, and n_fish_M). The number of fish can then easily be added together to create the number of sexed fish. This is all returned from |
Thank you for testing this out @iantaylor-NOAA. I agree with your decision @kellijohnson-NOAA to keep the options which then allows users to explore which approach works best for their model. |
Yes, nice work @kellijohnson-NOAA keeping things simple and straightforward. Thank you for your service in getting this stuff done. |
Input sample size needs to be given more thought. 2020-10-21 PEP team meeting talked about best practices going forward. The methods may not be consistent across species because of life-history characteristics.
The text was updated successfully, but these errors were encountered: