Robustness tests and inference helpers #38

JelleAalbers · 2022-09-15T22:08:09Z

This adds support for end-to-end robustness tests.

A new script, robustness_test.py, allows you to quickly do the main sequence of tasks -- generate images, get predictions from a network, and run a hierarchical inference MCMC. You can run the config as-is, or specify which parts of the config to change -- e.g. set the population's sigma_sub or theta_E to a different distribution or a fixed value.

The new paltas.Analysis.gaussian_inference file supports this with:

A run_network_on function that runs a neural network over all images in a folder, rotationally averaging predictions similar to the paltas paper. The main difference is that this code rotates over a deterministic set of rotations (e.g. 0 degrees, 10 degrees, 20, etc) rather than a random set -- for n_rotations large enough this should not matter. Results are saved in an npz in the image folder, or returned.
A GaussianInference class to help coordinate hierarchical inference. You can initialize it with a folder you previously did run_network_on on (with .from_folder), or just the npz it produces (with .from_npz). GaussianInference instances have a .bayesian_mcmc method that runs hierarchical inference as described in the paper. Only difference is that the hyperpriors on std_[population_parameter] are uniform in [0, inf] rather than uniform in some range of log, to avoid concentrating the hyperprior at very small values. You can revert to the log prior by setting log_sigma=True when building GaussianInference.

Feel free to tell me some or all of this doesn't belong in paltas! This is probably an alternative implementation of the stuff you already have in jupyter notebooks; would be useful to compare them at some point.

Other changes

generate.py and robustness_test.py can now both be called through functions as well as shell commands. A small command line interface generator (in Utils.cli_maker) takes care of this. This also simplifies some test code.
Include .cvs and .fits files already in the repo in the python package. On the main branch, setuptools omits bad_galaxies.csv etc from the wheel, since it's not listed as package data. As long as you do a developer/editable install this doesn't matter, but if you do a regular pip install this will bite you (at least when using the paper configs). I had to move the HST PSFs to somewhere under paltas (and add an __init__.py to the folder); I don't think it's easy to control where data files that live outside the package get shipped. A directory symlink ensures that any old code that relies on this location still works.
Sampler changes:
- 9213557: Allow specifying fixed values for parameters that expect multiple values
- 2b3950d: Temporary workaround for Unpickled and deepcopied distributions do not use global random state scipy/scipy#16998. This ensures they scipy.stats distributions use the common random state before we draw from them. Without this, setting the random seed only has full effect on the training config, since other configs deepcopy the training config's config_dict. This is a hack, and for now only supports plain scipy.stats distributions and Duplicate distribution, not the other custom distributions paltas implements.
f267267: Feel free to disagree with this, happy to revert. This demotes some global variables in Analysis/hierarchical_inference.py to class attributes. The globals existed to allow the use of multiprocess.Pool in the MCMC inference. I think:
- Most inferences will run in single-core batch jobs, for which multiprocessing isn't available anyway.
- Global variables are 👿... or at least these particular ones caused confusion for me when I started working on this PR, and I can imagine they might hurt future users' brains too.
- For running locally, you generally want to run multiple configurations. This can still be parallelized, e.g.

from concurrent.futures import ProcessPoolExecutor, as_completed

def do_inference(folder):
    inf = gi.GaussianInference.from_folder(folder)
    summary, chain = inf.bayesian_mcmc()
    # save results somewhere
    
with ProcessPoolExecutor(max_workers=6) as exc:
    as_completed(exc.map(do_inference, folders))

f267267, f267267: unnormalize can now also unnormalize the precision matrix directly, and can be run independently for unnormalizing means and covariances
d28e29a, a4774f1: add options to artificially make source galaxy images larger or brighter (or smaller/fainter).

This will reduce emcee pooling performance (should look at this later), but restore sanity

... so we can run multiple inferences in parallel

To match what is written in the paltas paper

(Relative imports work differently depending whether you are already in a package or not... I should know this)

Maybe matplotlib has become less tolerant for negative yerr recently? Can't find the PR though.

Had to move files into a paltas subdir with __init__.py. Including files outside package dir is tricky.

coveralls · 2022-09-17T00:39:02Z

Coverage decreased (-1.09%) to 93.983% when pulling 9db46e1 on gaussian_inference_helper into 112a4a5 on main.

JelleAalbers · 2023-08-18T23:31:56Z

I moved most of this to a separate repository here https://github.com/JelleAalbers/bendorbreak, so we can close this. I might make a new PR for some of the smaller fixes in here.

JelleAalbers and others added 28 commits September 16, 2022 12:07

Analytic/Gaussian inference helper class

40d6d18

Demote global variables to class attributes

4912f54

This will reduce emcee pooling performance (should look at this later), but restore sanity

Allow unnormalization of the precision matrix

74d65e4

Neural network runner, fixes/renaming

b441185

Allow dataset generation with function calls

ba843db

Robustness test helper script, minor fixes

6fd26ef

Make scripts executable by default

3045f51

Use safe config py names

ee4160d

Remove hdf5 mcmc backend

dae3278

... so we can run multiple inferences in parallel

Fixes, even safer config name generation

990cf59

Sampler: allow fixed value tuples

49c9231

Use config_val as a reference instead

3bed681

Specify reference config, infer all params

bc54cb0

Rotational averaging in robustness tests

1b8ca0b

Workaround for scipy.stats seeding issue

2c3599f

Allow artificial scaling of source pixel images

02a988c

Allow unnormalize call without mean correction

6b63688

Revert to network_outputs.npz

9d5e283

Use 100 rots by default, not for center_x and center_y

a8849a4

To match what is written in the paltas paper

Allow use of fewer images, small extensions

f8de180

Allow varying no or multiple params, sane dataset names

d61eb7c

Common command line interface maker

c43a4b7

Flag arguments, ship norms.csv, cleanup option

3133c52

Add end-to-end test in test suite

d65fa1f

Add brightness scaling option

7e67c9d

Fix generate tests, add test for CLI maker

67ac5e5

Fix shebang line typo, ignore auto-downloaded model

4b657ba

Make k-corrections optional

81fb4af

JelleAalbers force-pushed the gaussian_inference_helper branch 2 times, most recently from b33426c to 8079dfb Compare September 16, 2022 20:26

Fix import snafu

4a35df6

(Relative imports work differently depending whether you are already in a package or not... I should know this)

JelleAalbers force-pushed the gaussian_inference_helper branch from 8079dfb to 4a35df6 Compare September 16, 2022 20:39

JelleAalbers added 7 commits September 16, 2022 15:08

Include csv files in package

74e1af2

Fix posterior plot test

94bfd3d

Maybe matplotlib has become less tolerant for negative yerr recently? Can't find the PR though.

Include .fits files (HST PSF) in package

b5f66c7

Had to move files into a paltas subdir with __init__.py. Including files outside package dir is tricky.

Use mini-COSMOS for end-to-end test

406a066

Get config_dict without ConfigHandler, fix symlink

e927254

Forgot numdifftools requirement

b88b6d8

numdifftools and emcee are test requirements now

8f29e4c

JelleAalbers marked this pull request as ready for review October 5, 2022 17:12

JelleAalbers mentioned this pull request Oct 5, 2022

Common base class for extensibility and reduced boilerplate #40

Open

Add type annotations so generate+autocli works

9db46e1

JelleAalbers closed this Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robustness tests and inference helpers #38

Robustness tests and inference helpers #38

JelleAalbers commented Sep 15, 2022 •

edited

Loading

coveralls commented Sep 17, 2022 •

edited

Loading

JelleAalbers commented Aug 18, 2023

Robustness tests and inference helpers #38

Robustness tests and inference helpers #38

Conversation

JelleAalbers commented Sep 15, 2022 • edited Loading

Other changes

coveralls commented Sep 17, 2022 • edited Loading

JelleAalbers commented Aug 18, 2023

JelleAalbers commented Sep 15, 2022 •

edited

Loading

coveralls commented Sep 17, 2022 •

edited

Loading