Develop smarter way to check Oregon number of sex samples by sample number #43

chantelwetzel-noaa · 2021-01-15T18:05:47Z

Currently, within the EF1_Denominator function called from getExpansion_1 does an internal check for Oregon data to determine if the number of samples by sex matches the number recorded in FEMALE_NUM, MALE_NUM, or UNK_NUM. However, if you do some external QAQC and remove ages or lengths that do not seem plausible, the EF1_Denominator check of these columns will fail. Either need to update check for OR in EF1_Denominator check to be smarter, create a function to recalculate and overwrite the MALE_NUM, FEMALE_NUM, or UNK_NUM based on removed samples (not sure how this would then impact EXP_WT values), or recommend that no records be removed from the data set.

kellijohnson-NOAA · 2021-01-15T19:30:36Z

Is is that EF1_Denominator needs to be changed or that something else needs to be done when people remove lengths or ages because it affects samples sizes and weights of those samples for things are calculated later on. So, I would say that the function did its job in that it recognized this.

Maybe there needs to be a helper function for users, remove_bds that updates other portions of the data set and removes a row? I am just trying to brainstorm. Would you ever want to remove a length but not the age from that lengthed fish? This is where it gets tricky b/c the sample weights are used to expand the comp and if you want the length but you don't want the age then the sample weights are wrong. Which is why I have _l and _a values when weighting.

chantelwetzel-noaa · 2021-01-15T20:01:36Z

If you remove data that are from Oregon the EF1_Denominator function stops with an error message based on the checks from line 111-144. The check is correct because the values in NUMBER_X columns do not match the internally calculated samples (because of the removed data). There is no easy way to get around this error without changing the NUMBER_X columns by hand for all the SAMLE_NOs where they do not match the internal check, so it forces the user to updated the NUMBER_X values by hand for all the SAMPLE_NO where they do not pass the internal check or retain length or age data that do not look plausible.

When I find a length or age that does not seem plausible, I opt to remove the entire record. Typically the number of records removed are a very small percentage of total good records. I just did a check where if I just replace both the length and age in the "bad" records with NAs the first stage expansion now works. If this is a better approach we may want to clarify guidance. If there are particular concerns with replacing both the length and age with NAs due to how the expansions are calculated we can revisit this option.

kellijohnson-NOAA · 2021-01-15T20:07:45Z

Putting in NA would expand the sample using the weight of the fish that has an NA but it wouldn't actually be providing any information to the composition. So your ending composition would essentially be "upweighted" based on the amount of information you are putting in for that sample relative to other samples.

…

On Fri, Jan 15, 2021 at 12:01 PM Chantel Wetzel ***@***.***> wrote: If you remove data that are from Oregon the EF1_Denominator function stops with an error message based on the checks from line 111-144. The check is correct because the values in NUMBER_X columns do not match the internally calculated samples (because of the removed data). There is no easy way to get around this error without changing the NUMBER_X columns by hand for all the SAMLE_NOs where they do not match the internal check, so it forces the user to updated the NUMBER_X values by hand for all the SAMPLE_NO where they do not pass the internal check or retain length or age data that do not look plausible. When I find a length or age that does not seem plausible, I opt to remove the entire record. Typically the number of records removed are a very small percentage of total good records. I just did a check where if I just replace both the length and age in the "bad" records with NAs the first stage expansion now works. If this is a better approach we may want to clarify guidance. If there are particular concerns with replacing both the length and age with NAs due to how the expansions are calculated we can revisit this option. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7LCFGQ6BP7VSJ7H4TG2J3S2CNLBANCNFSM4WENFSFQ> .

-- Kelli Faye Johnson, PhD Research Fish Biologist Northwest Fisheries Science Center National Marine Fisheries Service 2725 Montlake Boulevard East Seattle, Washington 98112 (206) 860-3490 [email protected]

chantelwetzel-noaa · 2021-01-15T20:25:03Z

Hmm. If the number length and ages that are being replaced as NA is small relative to the total samples by species the impact of "upweighting" would like be small (assuming you are not removing a bunch of lengths/ages from a single SAMPLE_NO) but conversely the impact of leaving in these "bad" records also would be minimal. Need to think a bit more on how to treat this type of data. Thanks for all the input @kellijohnson-NOAA

kellijohnson-NOAA added type: bug status: question Questions about the issue need answered topic: code Related to R code within this package priority: high The highest priority level in terms of what needs to be done. labels May 4, 2022

kellijohnson-NOAA added this to the year_2022 milestone May 4, 2022

kellijohnson-NOAA modified the milestones: year_2022, year_2023 May 8, 2023

kellijohnson-NOAA modified the milestones: year_2023, year_2025 May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop smarter way to check Oregon number of sex samples by sample number #43

Develop smarter way to check Oregon number of sex samples by sample number #43

chantelwetzel-noaa commented Jan 15, 2021

kellijohnson-NOAA commented Jan 15, 2021

chantelwetzel-noaa commented Jan 15, 2021

kellijohnson-NOAA commented Jan 15, 2021 via email

chantelwetzel-noaa commented Jan 15, 2021

Develop smarter way to check Oregon number of sex samples by sample number #43

Develop smarter way to check Oregon number of sex samples by sample number #43

Comments

chantelwetzel-noaa commented Jan 15, 2021

kellijohnson-NOAA commented Jan 15, 2021

chantelwetzel-noaa commented Jan 15, 2021

kellijohnson-NOAA commented Jan 15, 2021 via email

chantelwetzel-noaa commented Jan 15, 2021