Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate boottest()'s seed argument (?) #94

Closed
s3alfisc opened this issue Feb 5, 2023 · 2 comments · Fixed by #95
Closed

Deprecate boottest()'s seed argument (?) #94

s3alfisc opened this issue Feb 5, 2023 · 2 comments · Fixed by #95
Labels
enhancement New feature or request

Comments

@s3alfisc
Copy link
Owner

s3alfisc commented Feb 5, 2023

The seed argument of boottest() allows to fix a random seed for random number generation within boottest.

boottest() then calls either set.seed() (for Mammen weights) or dqrng::dqset.seed() for all other weights. Both functions manipulate the global seed state, and therefore influence what happens outside of boottest().

Here is a quick example:

unintentionally_set_global_seed <- function(seed){
   set.seed(seed)
}

set.seed(123) 
rnorm(1)
#[1] -0.5604756
unintentionally_set_global_seed (123)
rnorm(1)
#[1] -0.5604756

set.seed() is not "scoped" within unintentionally_set_global_seed (), it overrides the global seed state, and in consequence, rnorm(1) generates the same random variable.

One fix is to use R's on.exit() function:

set_local_seed <- function(seed){
   # fetch the current global seed state via .Random.seed
   old <- .Random.seed
   set.seed(seed)
  on.exit( { .Random.seed <<- old } )
}

Now we get

set.seed(123)
rnorm(1)
# [1] -0.5604756
set_local_seed(123)
rnorm(1)
#[1] -0.2301775

set.seed(123)
rnorm(1)
# [1] -0.5604756
rnorm(1)
# [1] -0.2301775

Calling set.seed() is scoped within set_local_seed().

Unfortunately, for the dqrng package, it is currently not possible to fetch the global seed state (as via .Random.seed for the 'standard R' random seed).

In consequence, the option above is not feasible. So should I drop the seed argument?

Here is an example with boottest():

library(fwildclusterboot)

data(voters)

lm_fit <- lm(proposition_vote ~ treatment, data = voters)

dqrng::dqset.seed(123)

boot1 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1")
boot2 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1"
)
boot3 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1"
)
pval(boot1); pval(boot2); pval(boot3)
# [1] 0.00050005
# [1] 0.0010001
# [1] 0.00090009

dqrng::dqset.seed(123)
boot1 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1")
boot2 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1", 
                  seed = 123
)
boot3 <- boottest(lm_fit,
                  B = 9999,
                  param = "treatment",
                  clustid = "group_id1"
)
pval(boot1); pval(boot2); pval(boot3)
# [1] 0.00050005
# [1] 0.00050005 # set seed 123 again -> same results
# [1] 0.00080008 # compare with pvalue 2 from run above 

I actually don't think that there is a big problem here, as all results are still fully reproducible if scripts are run from top-to-bottom in a clean working environment. Nevertheless, I might drop the seed argument in a future release.

Related: daqana/dqrng#41

@s3alfisc
Copy link
Owner Author

s3alfisc commented Feb 5, 2023

I anyways should be much clearer about this in the documentation of the seed argument.

@s3alfisc s3alfisc linked a pull request Feb 11, 2023 that will close this issue
@s3alfisc s3alfisc added the enhancement New feature or request label Feb 12, 2023
@s3alfisc
Copy link
Owner Author

I have decided to deprecated the seed argument. Here's what I have prepared for the news.md:

News v0.13

Prior to the changes introduced in v0.13, boottest() will always call set.seed() or dqrng::dqset.seed() internally, regardless of whether the seed argument is specified or not (in the ladder case, it will create an internal seed by randomly drawing from a large set of integers). I considered this harmless, as setting seeds inside boottest() in this way does not affect the reproducibility of scripts run end-to-end.

However, I have learned that is generally considered bad practice to overwrite global variables without notification - for example, the authors of numpy have deprecated their np.random.seed() function for this reason.

Here is a quick example on what happens if a function "reseeds": it affects the future chain of random draws.

fn_reseed <- function(x){set.seed(x)}

set.seed(123)
rnorm(1)
# [1] -0.5604756
fn_reseed(1)
rnorm(1)
# [1] -0.6264538

set.seed(123)
rnorm(1); rnorm(1)
# [1] -0.5604756
# [1] -0.2301775

The two 'second' calls to rnorm(1) are based on different global seed states.

As a result, I have decided to deprecate the seed' function argument. Random number generation must now **be** set outside of boottest()usingset.seed()anddqrng::dqset.seed()`.

This means that bootstrap results generated via versions < 0.13 will no longer be exactly replicable under the new version, but with a sufficiently large number of bootstrap iterations, this change should not affect any of your conclusions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant