Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Selector module] Implementing D-optimal designs #71

Open
FanwangM opened this issue May 13, 2022 · 3 comments
Open

[Selector module] Implementing D-optimal designs #71

FanwangM opened this issue May 13, 2022 · 3 comments
Assignees
Labels
feature help wanted Extra attention is needed low priority
Milestone

Comments

@FanwangM
Copy link
Collaborator

FanwangM commented May 13, 2022

Information related to determinantal point processes can be found at #4.

fast greedy algorithm
review
code
two algorithms here

@FarnazH FarnazH added this to the release milestone May 19, 2022
@Ali-Tehrani
Copy link
Collaborator

Ali-Tehrani commented Jun 17, 2022

D-optimal designs is the idea of finding the subset that maximizes the determinant of the overlap matrix X^T X, where X is the feature matrix. It's first used in QSAR in the early 1993. As noted in 1), it gives good minimal, diverse set however doesn't far sample well in the inner-region, this motivated them to use a onion design, where the dataset is split into groups and the process is repeated on each group.

Determinantal point-process (DPP) can only sample up to the rank of X^T X, and so I would favor to implement the D-optimal instead. A list of algorithms for D-optimal is included in [2]. A naive algorithm would be to sample using DPP (better than random sampling), and check if the determinant of the submatrix of including the sample increased and if so, add the new sample to the list of points and repeat.

[1] "D-optimal onion designs in statistical molecular design"
[2] R. Dennis Cook & Christopher J. Nachtrheim (1980) A Comparison of
Algorithms for Constructing Exact D-Optimal Designs, Technometrics, 22:3, 315-324, DOI:
10.1080/00401706.1980.10486162

@PaulWAyers
Copy link
Member

OK, let's try the d-optimal set.

I think we can also report the determinant of the Gramian as a measure of diversity? Obviously it is zero when linear dependence arises...

We have (will have) the capacity to convert distance matrices to Gramians too, and I think(?) that any sort of symmetric-positive-definite matrix can be used for to initiate D-optimal sampling, yes?

@PaulWAyers
Copy link
Member

It seems this is implemented at
https://basf.github.io/doe/

@FanwangM FanwangM changed the title Implementing determinantal point processes Implementing D-optimal designs May 29, 2023
@FanwangM FanwangM changed the title Implementing D-optimal designs [Selector module] Implementing D-optimal designs May 29, 2023
@FanwangM FanwangM added the help wanted Extra attention is needed label May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature help wanted Extra attention is needed low priority
Projects
None yet
Development

No branches or pull requests

4 participants