You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
D-optimal designs is the idea of finding the subset that maximizes the determinant of the overlap matrix X^T X, where X is the feature matrix. It's first used in QSAR in the early 1993. As noted in 1), it gives good minimal, diverse set however doesn't far sample well in the inner-region, this motivated them to use a onion design, where the dataset is split into groups and the process is repeated on each group.
Determinantal point-process (DPP) can only sample up to the rank of X^T X, and so I would favor to implement the D-optimal instead. A list of algorithms for D-optimal is included in [2]. A naive algorithm would be to sample using DPP (better than random sampling), and check if the determinant of the submatrix of including the sample increased and if so, add the new sample to the list of points and repeat.
[1] "D-optimal onion designs in statistical molecular design"
[2] R. Dennis Cook & Christopher J. Nachtrheim (1980) A Comparison of
Algorithms for Constructing Exact D-Optimal Designs, Technometrics, 22:3, 315-324, DOI:
10.1080/00401706.1980.10486162
I think we can also report the determinant of the Gramian as a measure of diversity? Obviously it is zero when linear dependence arises...
We have (will have) the capacity to convert distance matrices to Gramians too, and I think(?) that any sort of symmetric-positive-definite matrix can be used for to initiate D-optimal sampling, yes?
Information related to determinantal point processes can be found at #4.
fast greedy algorithm
review
code
two algorithms here
The text was updated successfully, but these errors were encountered: