Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over-weight of crosslinking data #11

Open
Yan-Yan-2020 opened this issue May 16, 2023 · 4 comments
Open

Over-weight of crosslinking data #11

Yan-Yan-2020 opened this issue May 16, 2023 · 4 comments

Comments

@Yan-Yan-2020
Copy link

Hi,

How can we figure out the over weight problem for crosslinking data? i noticed if there are lots of crosslinking restraints for one sequence, the final models looks like over-constrained and some well-folded domains looks unstructured.

Thanks.
Yan

@grandrea
Copy link
Collaborator

you can increase -Neff to downweight the crosslinks (see the influence of this in the supplementary figures of the alphalink paper) or remove msa subsampling altogether. Alternatively, you can change the fdr number on the crosslinking restraints or flatten the shape of the distribution. Finally, you can run multiple times with subsets of restraints. I also encourage you to carefully look at the crosslinking MS data to ensure error thresholding is done properly.

@Yan-Yan-2020
Copy link
Author

Great! thank you so much! I'm trying these ways to see how it looks.

Thanks.
Yan

@Yan-Yan-2020
Copy link
Author

Hi,
Running multiple times with subsets of restraints would be a better solution in my case. Do you have any detailed workflow on it? will you use the restrained model as a new input for next subset of restraints? How do you filter the restraints as a subset?

Thanks.
Yan

@lhatsk
Copy link
Owner

lhatsk commented Jun 2, 2023

Hi,
This workflow is not implemented at the moment. What you could do is shuffle your links once they are loaded and pick a subset. E.g., like this (untested!):

np.random.shuffle(links)
subset = 0.8
links = links[:int(n * subset)]

Should be inserted here: https://github.com/lhatsk/AlphaLink/blob/main/predict_with_crosslinks.py#L292

To have more control over the subsets, it might make sense to partition beforehand and just use the newly created CSV files, if you want to filter/ iteratively add restraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants