-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop smarter way to check Oregon number of sex samples by sample number #43
Comments
Is is that EF1_Denominator needs to be changed or that something else needs to be done when people remove lengths or ages because it affects samples sizes and weights of those samples for things are calculated later on. So, I would say that the function did its job in that it recognized this. Maybe there needs to be a helper function for users, |
If you remove data that are from Oregon the EF1_Denominator function stops with an error message based on the checks from line 111-144. The check is correct because the values in NUMBER_X columns do not match the internally calculated samples (because of the removed data). There is no easy way to get around this error without changing the NUMBER_X columns by hand for all the SAMLE_NOs where they do not match the internal check, so it forces the user to updated the NUMBER_X values by hand for all the SAMPLE_NO where they do not pass the internal check or retain length or age data that do not look plausible. When I find a length or age that does not seem plausible, I opt to remove the entire record. Typically the number of records removed are a very small percentage of total good records. I just did a check where if I just replace both the length and age in the "bad" records with NAs the first stage expansion now works. If this is a better approach we may want to clarify guidance. If there are particular concerns with replacing both the length and age with NAs due to how the expansions are calculated we can revisit this option. |
Putting in NA would expand the sample using the weight of the fish that has
an NA but it wouldn't actually be providing any information to the
composition. So your ending composition would essentially be "upweighted"
based on the amount of information you are putting in for that sample
relative to other samples.
…On Fri, Jan 15, 2021 at 12:01 PM Chantel Wetzel ***@***.***> wrote:
If you remove data that are from Oregon the EF1_Denominator function stops
with an error message based on the checks from line 111-144. The check is
correct because the values in NUMBER_X columns do not match the internally
calculated samples (because of the removed data). There is no easy way to
get around this error without changing the NUMBER_X columns by hand for all
the SAMLE_NOs where they do not match the internal check, so it forces the
user to updated the NUMBER_X values by hand for all the SAMPLE_NO where
they do not pass the internal check or retain length or age data that do
not look plausible.
When I find a length or age that does not seem plausible, I opt to remove
the entire record. Typically the number of records removed are a very small
percentage of total good records. I just did a check where if I just
replace both the length and age in the "bad" records with NAs the first
stage expansion now works. If this is a better approach we may want to
clarify guidance. If there are particular concerns with replacing both the
length and age with NAs due to how the expansions are calculated we can
revisit this option.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7LCFGQ6BP7VSJ7H4TG2J3S2CNLBANCNFSM4WENFSFQ>
.
--
Kelli Faye Johnson, PhD
Research Fish Biologist
Northwest Fisheries Science Center
National Marine Fisheries Service
2725 Montlake Boulevard East
Seattle, Washington 98112
(206) 860-3490
[email protected]
|
Hmm. If the number length and ages that are being replaced as NA is small relative to the total samples by species the impact of "upweighting" would like be small (assuming you are not removing a bunch of lengths/ages from a single SAMPLE_NO) but conversely the impact of leaving in these "bad" records also would be minimal. Need to think a bit more on how to treat this type of data. Thanks for all the input @kellijohnson-NOAA |
Currently, within the EF1_Denominator function called from getExpansion_1 does an internal check for Oregon data to determine if the number of samples by sex matches the number recorded in FEMALE_NUM, MALE_NUM, or UNK_NUM. However, if you do some external QAQC and remove ages or lengths that do not seem plausible, the EF1_Denominator check of these columns will fail. Either need to update check for OR in EF1_Denominator check to be smarter, create a function to recalculate and overwrite the MALE_NUM, FEMALE_NUM, or UNK_NUM based on removed samples (not sure how this would then impact EXP_WT values), or recommend that no records be removed from the data set.
The text was updated successfully, but these errors were encountered: