ENH: Compile the functions needed by SMC before the worker processes are started #7413

EliasRas · 2024-07-17T11:24:53Z

Before

No response

After

No response

Context for the issue:

As explained in #7224, sample_smc can fail on Windows if the model is defined using CustomDist. Everything works on Linux because the worker processes are forked by default whereas they're spawned on Windows. Spawning leads to missing dispatches.

#7241 addressed the issue by manually registering the dispatches in the worker processes. A better way to solve the problem would be to compile the necessary functions before starting the worker processes (as is done in sample).

The text was updated successfully, but these errors were encountered:

EliasRas · 2024-07-19T12:32:06Z

This is based on the issues that come from using CustomDist since it's where I encountered the issue. I don't know if similar issues could arise somewhere else or if those issues should even be addressed here.

The functions needed by SMC are compiled in SMC_KERNEL._initialize_kernel, which is currently called in each worker process.

pymc/pymc/smc/sampling.py

Line 340 in ab467da

smc._initialize_kernel()

First function (inside Model.initial_point) is used to generate initial_point. initial_point is used to determine the shape/size of variables and to later create the likelihood function and prior functions. The function uses support_point by default but can also sample from prior. If support_point is not defined, the implementation defaults to using the prior, If I understood right, sampling from prior shouldn't cause any issues since random should be defined using numpy which is always available.

pymc/pymc/smc/kernels.py

Line 218 in ab467da

initial_point = self.model.initial_point(random_seed=self.rng.integers(2**30))

Second function (in SMC_KERNEL.initialize_population) is very similar to the previous but it samples from prior by default and shouldn't cause issues.

pymc/pymc/smc/kernels.py

Line 225 in ab467da

init_rnd = self.initialize_population()

Finally, the functions used to calculate the prior probabilities and likelihoods are defined using the respective model properties. Here the issues come from using logp.

pymc/pymc/smc/kernels.py

Lines 237 to 242 in ab467da

    
           self.prior_logp_func = _logp_forw( 
        
               initial_point, [self.model.varlogp], self.variables, shared 
        
           ) 
        
           self.likelihood_logp_func = _logp_forw( 
        
               initial_point, [self.model.datalogp], self.variables, shared 
        
           )

Is it possible that SMC_KERNEL.setup_kernel would some day contain compilation? Currently it is empty but is called nonetheless.

I think that the necessary fixes are

Compile the function that generates initial_point in the main process. It would be simple to copy the relevant code from Model.initial_point and call it before the worker processes are started but I don't know if that's a very good solution. Feels like unnecessary duplication.
Compile the prior and likelihood functions beforehand. Could be done by storing SMC_KERNEL.model.varlogp and SMC_KERNEL.model.datalogp in SMC_KERNEL.__init__.
Somehow make the pitfall of dispatch registration more obvious. I don't know if this is necessary since the test added in Register the overloads added by CustomDist in worker processes #7241 catches at least some of the issues and the breaking changes wouldn't get through.

EliasRas · 2024-07-19T12:32:20Z

As a side note, isn't this

pymc/pymc/smc/kernels.py

Line 235 in ab467da

shared = make_shared_replacements(initial_point, self.variables, self.model)

unnecessary? It makes shared variables from set(self.model.value_vars) - set(self.variables) but the set is always empty if I didn't miss any modifications to self.variables.

ricardoV94 · 2024-07-19T13:42:17Z

shared = make_shared_replacements(initial_point, self.variables, self.model). Right I think this is used in step samplers, where each sampler can be responsible for a subset of the variables. I guess all the SMC kernels must always sample all the variables so this shouldn't be needed?

ricardoV94 · 2024-07-19T13:43:22Z

For the initial point, we could just pass the result for each chain instead of the function that computes the initial point?

EliasRas · 2024-07-21T12:31:05Z

I think this is used in step samplers, where each sampler can be responsible for a subset of the variables. I guess all the SMC kernels must always sample all the variables so this shouldn't be needed?

Oh, seems like a good choice to leave it as it is then if there's even a remote possibility to break something in the future. It doesn't cost much to include the check anyway.

For the initial point, we could just pass the result for each chain instead of the function that computes the initial point?

Sounds good to me.

ricardoV94 · 2024-07-22T14:48:33Z

Oh, seems like a good choice to leave it as it is then if there's even a remote possibility to break something in the future. It doesn't cost much to include the check anyway.

No, I think this used to be how it worked back when SMC was just another step sampler. But since it hasn't been for a while it doesn't make sense to keep the code complexity around. We can always reintroduce

EliasRas added the feature request label Jul 17, 2024

EliasRas linked a pull request Aug 22, 2024 that will close this issue

Compile the functions needed by SMC before the worker processes are started #7472

Draft

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Compile the functions needed by SMC before the worker processes are started #7413

ENH: Compile the functions needed by SMC before the worker processes are started #7413

EliasRas commented Jul 17, 2024

EliasRas commented Jul 19, 2024

EliasRas commented Jul 19, 2024

ricardoV94 commented Jul 19, 2024

ricardoV94 commented Jul 19, 2024

EliasRas commented Jul 21, 2024

ricardoV94 commented Jul 22, 2024 •

edited

Loading

ENH: Compile the functions needed by SMC before the worker processes are started #7413

ENH: Compile the functions needed by SMC before the worker processes are started #7413

Comments

EliasRas commented Jul 17, 2024

Before

After

Context for the issue:

EliasRas commented Jul 19, 2024

EliasRas commented Jul 19, 2024

ricardoV94 commented Jul 19, 2024

ricardoV94 commented Jul 19, 2024

EliasRas commented Jul 21, 2024

ricardoV94 commented Jul 22, 2024 • edited Loading

ricardoV94 commented Jul 22, 2024 •

edited

Loading