Skip to content

Commit

Permalink
change how samplers are specified; clean up how results are passed an…
Browse files Browse the repository at this point in the history
…d written out; update docs; add sampler test script.
  • Loading branch information
bd-j committed Aug 28, 2024
1 parent 7648e67 commit 84c88b2
Show file tree
Hide file tree
Showing 8 changed files with 280 additions and 115 deletions.
12 changes: 7 additions & 5 deletions demo/demo_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ def build_obs(objid=0, phottable='demo_photometry.dat',
# import astropy.io.fits as pyfits
# catalog = pyfits.getdata(phottable)

from prospect.data.observation import Photometry, Spectrum
from prospect.observation import Photometry, Spectrum

# Here we will read in an ascii catalog of magnitudes as a numpy structured
# array
Expand Down Expand Up @@ -262,10 +262,12 @@ def build_all(**kwargs):
output = fit_model(obs, model, sps, **config)

print("writing to {}".format(hfile))
writer.write_hdf5(hfile, run_params, model, obs,
output["sampling"][0], output["optimization"][0],
tsample=output["sampling"][1],
toptimize=output["optimization"][1],
writer.write_hdf5(hfile,
config,
model,
obs,
output["sampling"],
output["optimization"],
sps=sps
)

Expand Down
25 changes: 13 additions & 12 deletions demo/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,25 +164,26 @@ that for now.

Running a fit
----------------------
There are two kinds of fitting packages that can be used with |Codename|. The
first is ``emcee`` which implements ensemble MCMC sampling, and the second is
``dynesty``, which implements dynamic nested sampling. It is also possible to
perform optimization. If ``emcee`` is used, the result of the optimization will
be used to initalize the ensemble of walkers.
There are a few kinds of fitting packages that can be used with |Codename|. The
first is ``emcee`` which implements ensemble MCMC sampling. A number of nested
samplers are also available (``dynesty``, ``nautilus``, and ``ultranest``). It
is also possible to perform optimization. If ``emcee`` is used, the result of
the optimization will be used to initalize the ensemble of walkers.

The choice of which fitting algorithms to use is based on command line flags
(``--optimization``, ``--emcee``, and ``--dynesty``.) If no flags are set the
model and data objects will be generated and stored in the output file, but no
fitting will take place. To run the fit on object number 0 using ``emcee`` after
an initial optimization, we would do the following at the command line
(``--optimization``, ``--emcee``, and ``--nested_sampler <dynesty|nautilus|ultranest>``.)
If no flags are set the model and data objects will be generated and stored in
the output file, but no fitting will take place. To run the fit on object number
0 using ``emcee`` after an initial optimization, we would do the following at
the command line

.. code-block:: shell
python demo_params.py --objid=0 --emcee --optimize \
--outfile=demo_obj0_emcee
If we wanted to change something about the MCMC parameters, or fit a different object,
we could also do that at the command line
If we wanted to change something about the MCMC parameters, or fit a different
object, we could also do that at the command line

.. code-block:: shell
Expand All @@ -193,7 +194,7 @@ And if we want to use nested sampling with ``dynesty`` we would do the following

.. code-block:: shell
python demo_params.py --objid=0 --dynesty \
python demo_params.py --objid=0 --nested_sampler dynesty \
--outfile=demo_obj0_dynesty
Finally, it is sometimes useful to run the script from the interpreter to do
Expand Down
36 changes: 20 additions & 16 deletions doc/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,12 @@ it should still take of order tens of minutes.
output = fit_model(obs, model, sps, lnprobfn=lnprobfn,
optimize=False, dynesty=True,
**fitting_kwargs)
result, duration = output["sampling"]
result = output["sampling"]
The ``result`` is a dictionary with keys giving the Monte Carlo samples of
parameter values and the corresponding posterior probabilities. Because we are
using dynesty, we also get weights associated with each parameter sample in the
chain.
using nested sampling, we also get weights associated with each parameter sample
in the chain.

Typically we'll want to save the fit information. We can save the output of the
sampling along with other information about the model and the data that was fit
Expand All @@ -168,12 +168,16 @@ as follows:
.. code:: python
from prospect.io import write_results as writer
hfile = "./quickstart_dynesty_mcmc.h5"
writer.write_hdf5(hfile, {}, model, observations,
output["sampling"][0], None,
sps=sps,
tsample=output["sampling"][1],
toptimize=0.0)
writer.write_hdf5("./quickstart_dynesty_mcmc.h5",
config=fitting_kwargs,
model=model,
obs=observations,
output["sampling"],
None,
sps=sps)
Note that this doesn't include all the config information that would normally be stored (see :ref:`usage`)


Make plots
----------
Expand All @@ -187,14 +191,14 @@ information we can use the built-in reader.
hfile = "./quickstart_dynesty_mcmc.h5"
out, out_obs, out_model = reader.results_from(hfile)
This gives a dictionary of useful information (``out``), as well as the obs
dictionary that we were using and, in some cases, a reconsitituted model object.
However, that is only possible if the model generation code is saved to the
results file, in the form of the text for a `build_model()` function. Here we
will use just use the model object that we've already generated.
This gives a dictionary of useful information (``out``), as well as the obs data
that we were using and, in some cases, a reconsitituted model object. However,
that is *only* possible if the model generation code is saved to the results file,
in the form of the text for a `build_model()` function. Here we will use just
use the model object that we've already generated.

Now we will do some plotting. First, lets make a corner plot of the posterior.
We'll mark the highest probablity posterior sample as well.
First, lets make a corner plot of the posterior. We'll mark the highest
probablity posterior sample as well.

.. code:: python
Expand Down
40 changes: 23 additions & 17 deletions doc/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ execution:

.. code-block:: shell
python parameter_file.py --dynesty
python parameter_file.py --nested_sampler dynesty --nested_target_n_effective 512
Additional command line options can be given (see below) e.g.

Expand Down Expand Up @@ -50,6 +50,7 @@ writes output.
# --- Configure ---
args = parser.parse_args()
config = vars(args)
# allows parameter file text to be stored for model regeneration
config["param_file"] = __file__
# --- Get fitting ingredients ---
Expand All @@ -68,12 +69,14 @@ writes output.
output = fit_model(obs, model, sps, **config)
print("writing to {}".format(hfile))
writer.write_hdf5(hfile, run_params, model, obs,
output["sampling"][0], output["optimization"][0],
tsample=output["sampling"][1],
toptimize=output["optimization"][1],
sps=sps
)
writer.write_hdf5(hfile,
config=config,
model=model,
obs=obs,
output["sampling"],
output["optimization"],
sps=sps
)
try:
hfile.close()
Expand Down Expand Up @@ -147,7 +150,8 @@ might actually increase the total CPU usage).
To use MPI a "pool" of cores must be made available; each core will instantiate
the fitting ingredients separately, and a single core in the pool will then
conduct the fit, distributing likelihood requests to the other cores in the
pool. This requires changes to the final code block that instantiates and runs the fit:
pool. This requires changes to the final code block that instantiates and runs
the fit:

.. code-block:: python
Expand Down Expand Up @@ -189,7 +193,7 @@ pool. This requires changes to the final code block that instantiates and runs
# Evaluate SPS over logzsol grid in order to get necessary data in cache/memory
# for each MPI process. Otherwise, you risk creating a lag between the MPI tasks
# caching data depending which can slow down the parallelization
# caching SSPs which can slow down the parallelization
if (withmpi) & ('logzsol' in model.free_params):
dummy_obs = dict(filters=None, wavelength=None)
Expand Down Expand Up @@ -225,13 +229,15 @@ pool. This requires changes to the final code block that instantiates and runs
# Set up an output file and write
ts = time.strftime("%y%b%d-%H.%M", time.localtime())
hfile = "{0}_{1}_mcmc.h5".format(args.outfile, ts)
writer.write_hdf5(hfile, run_params, model, obs,
output["sampling"][0], output["optimization"][0],
tsample=output["sampling"][1],
toptimize=output["optimization"][1],
sps=sps)
hfile = f"{args.outfile}_worker{comm.rank()}_{ts}_mcmc.h5"
writer.write_hdf5(hfile,
run_params=run_params,
model=model,
obs=obs,
output["sampling"],
output["optimization"],
sps=sps
)
try:
hfile.close()
except(AttributeError):
Expand All @@ -243,7 +249,7 @@ Then, to run this file using mpi it can be called from the command line with som
mpirun -np <number of processors> python parameter_file.py --emcee
# or
mpirun -np <number of processors> python parameter_file.py --dynesty
mpirun -np <number of processors> python parameter_file.py --nested_sampler dynesty
Note that only model evaluation is parallelizable with `dynesty`, and many
operations (e.g. new point proposal) are still done in serial. This means that
Expand Down
34 changes: 22 additions & 12 deletions prospect/fitting/fitting.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,13 +185,18 @@ def fit_model(observations, model, sps, lnprobfn=lnprobfn,
# Make sure obs has required keys
[obs.rectify() for obs in observations]

if (emcee) & (bool(nested_sampler)):
msg = ("Cannot run both emcee and nested fits "
"in a single call to fit_model")
raise(ValueError, msg)

if (not bool(emcee)) & (not bool(nested_sampler)) & (not optimize):
msg = ("No sampling or optimization routine "
"specified by user; returning empty results")
warnings.warn(msg)

output = {"optimization": (None, 0.),
"sampling": (None, 0.)}
output = {"optimization": None,
"sampling": None}

if optimize:
optres, topt, best = run_minimize(observations, model, sps,
Expand All @@ -204,7 +209,8 @@ def fit_model(observations, model, sps, lnprobfn=lnprobfn,
run_sampler = run_ensemble
elif nested_sampler:
run_sampler = run_nested
kwargs["fitter"] = nested_sampler
# put nested_sampler back into kwargs for lower level functions
kwargs["nested_sampler"] = nested_sampler
else:
return output

Expand Down Expand Up @@ -412,14 +418,20 @@ def run_ensemble(observations, model, sps, lnprobfn=lnprobfn,
sampler, burn_loc0, burn_prob0 = out
ts = time.time() - go

return sampler, ts
try:
sampler.duration = ts
except:
pass

return sampler


def run_nested(observations, model, sps,
lnprobfn=lnprobfn,
fitter="dynesty",
nested_sampler="dynesty",
nested_nlive=1000,
nested_neff=1000,
verbose=False,
**kwargs):
"""Thin wrapper on :py:class:`prospect.fitting.nested.run_nested_sampler`
Expand Down Expand Up @@ -457,18 +469,16 @@ def run_nested(observations, model, sps,
likelihood = wrap_lnp(lnprobfn, observations, model, sps, nested=True)

# which keywords do we have for this fitter?
ns_kwargs, nr_kwargs = parse_nested_kwargs(fitter=fitter,
**kwargs)
ns_kwargs, nr_kwargs = parse_nested_kwargs(nested_sampler=nested_sampler,**kwargs)

go = time.time()
output = run_nested_sampler(model,
likelihood,
fitter=fitter,
verbose=False,
nested_sampler=nested_sampler,
verbose=verbose,
nested_nlive=nested_nlive,
nested_neff=nested_neff,
nested_sampler_kwargs=ns_kwargs,
nested_run_kwargs=nr_kwargs)
ts = time.time() - go
info, result_obj = output

return output, ts
return info
Loading

0 comments on commit 84c88b2

Please sign in to comment.