Database fitting

This wiki page provides an overview of the database fitting used to initialize BrainTrak fits for the corticothalamic model.

Overview

To improve convergence of the Markov chain to its stationary distribution (and in fact for many numerical optimization methods generally) it is best to initialize the fit from parameters that are already close to the optimal fit. For example, if a strong alpha peak is present in the data, it would be better to choose initial parameters that give rise to a strong alpha peak, rather than parameters that don't have an alpha peak. For BrainTrak, there are two things to keep in mind

The MCMC algorithm does quite a thorough job of traversing the parameter space. Some numerical methods (such as Levenberg-Marqardt) are prone to getting stuck in local minima, which means that choosing the initial parameters correctly is critical to getting an acceptable fit. With BrainTrak, a poor choice of initial conditions will generally not affect the final output, but it affects the convergence. That is, long runs may be slightly affected, but short runs could be significantly affected
As described in chain, the initial conditions are used to start the random walk with fixed proposal distribution. Then, a series of steps are taken to allow the Markov chain to reach it's stationary distribution and to perform initial adaptation of the proposal distribution. Finally, the set of steps actually used to identify the best fit and posterior distributions are taken. The upshot is that in a fit with 100000 points, there may be around 10000 steps separating the initial conditions from the first point actually used for fitting. Therefore, BrainTrak is relatively insensitive to the choice of initial parameters, although as described above, convergence will be faster (and therefore quality for a fixed number of steps will be better) if the initial parameters are already a good fit

The question is then how to select the initial parameters in a quick, efficient way. From the work done classifying regions in parameter space, we already have a large set of model parameters that are physiologically realistic and that have spectra already computed. These are the parameters contained within the blobs in Abeysuriya et. al. (2015). Therefore, when fitting the same corticothalamic model, a simple procedure would be

Take the experimental spectrum and compute chisq for every set of parameters in the data set
Find the one with the smallest value of chisq
Use the corresponding parameters as the initial parameters

This is essentially how the database fitting in braintrak/+db_fit works.

Implementation

The first issue is that the data sets in Abeysuriya et. al. (2015) contain around 1.1 million points. That's a lot of spectra to search through - and moreover, it's a lot of data that has to be loaded into memory even before the chisq measure can be calculated. So the question is whether this data set can be reduced.

The predicted spectra principally depend on XYZ. Therefore, what is important is that the set of parameters being tested cover a decently large portion of XYZ. Many of the points in the data set lie close together to each other. Since we optimize the initial conditions using BrainTrak, it doesn't matter if the initial conditions are slightly better or worse - just that they are in the ballpark. Therefore, we can construct a reduced data set by sampling from the full set of over 1 million points, such that the new data set uniformly samples the XYZ space. This is achieved by db_fit.nd_cutdown which bins the points in XYZ, and then selects a fixed number of points from each bin.

This yields a much smaller data set consisting of just over 4000 points. This seems to be all that is required to get an initial set of parameters good enough for further optimization. Because the data set is so small, performing the database fit has negligible computational cost.

Having determined a set of 4000 parameter combinations to test, the next question is how to actually store the parameters and perform the fitting. The precomputed parameters are stored in the file pp_allstates, which is generated by db_fit_make_all_states_pp. This file reads in the complete data sets, uses nd_cutdown to generate the representative subset, uses model.dispersion_calc (from corticothalamic-model) to compute the spectra, and then saves a struct db_data. This struct contains the following information

db_data = 
         P: [4367x98 double]
         f: [1x98 double]
       xyz: [4367x3 double]
       nus: [4367x8 double]
       gab: [4367x8 double]
    iswake: [4367x1 double]
      type: 'spec'

A typical database fit is performed using db_fit.quick_fit(). This function takes in an experimental spectrum and a BrainTrak model object. It then loads pp_allstates. The total power in the experimental spectrum is calculated, for the frequency weighting of the particular model. This allows the database fit to reflect any custom weightings subsequently used by the rest of BrainTrak, if those weightings ignore certain frequencies. This can make a big difference at low frequencies, where the rolloff in the data may be different to the model. There may be a significant discrepancy in power at frequencies lower than 1 Hz, which doesn't affect fitting, but will affect the normalization.

For each spectrum in db_data, the model spectrum is normalized to the same power, and then chisq is computed. The index of the fit with the smallest value of chisq is returned, together with db_data. The parameters (XYZ, nuab, gab , and iswake) can be obtained using this index, as seen in bt.model.full.initialize_fit().

It is possible to fit the spectrum using this database fitting system alone - however, the quality of the fit then depends on how many points are in the database. For that purpose, the limited subset with only 4000 points is probably insufficient, and a larger data set should be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database fitting

Overview

Implementation

Clone this wiki locally