Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding input and distance matrix padding. #27

Closed
n4ndoz opened this issue May 9, 2020 · 3 comments
Closed

Regarding input and distance matrix padding. #27

n4ndoz opened this issue May 9, 2020 · 3 comments

Comments

@n4ndoz
Copy link

n4ndoz commented May 9, 2020

Hi!
Wonderful work here, and wonderful code aswell.
I have a few questions regarding your model and some of your input preparation steps.
1- Why do you implemented padding as a new class and not as a mask, by multiplaying every add layer by this binary mask in order to avoid backprop of these regions?
2- Why did you created a different embbeding for the distances, and not only the threshold function?

@hypnopump
Copy link
Owner

Interesting comments.:

  1. I don't know if you mean adding the Padding as a Keras Layer at the beginning of the Net? I wasn't sure how to do that so I just did the padding in NumPy before.
  2. Not sure what you're referring to. In the data preparation functions here: https://github.com/EricAlcaide/MiniFold/blob/master/models/distance_pipeline/distance_generator_data.py I use the same function for padding both the distance and the pssm.

The codebase is from a year and a half ago so I don't have everything in my mind now. If you could clarify what you're referring to, I think I would be able to explain more.

Thanks for the interest in the project!

@n4ndoz
Copy link
Author

n4ndoz commented May 9, 2020

Hi!! Thanks for the quick reply!

  1. I am applying some parts of your model and modifying mainly the res blocks. The main trick is make a binary mask matrix (MaxL*MaxL, i'm using 256, so I can grab the major distribution of proteins in ProteinNet) where a subset LxL for each sequence is 1 and the rest is 0. This way, when you backprop the grads will be 0 where there is no protein info and the error is not propagated. It works? Well, questionable. hahahahahahahahah But it is what Raptor-X-Contact implemented.

  2. I just took a look at the embbeding_matrix function and understood. It pads the dist matrix, right?

Another question is: you did used Alpha Carbons as distance targets, right? You wrote that you applied the Model to ProteinNet, but it doesn't stores Beta Carbon coordinates, only N, Ca, C (CBeta being the "root" of side chain). I'm asking this because I've been trying to fetch the Beta Carbon coordinates from ProteinNet ids and been getting several issues regarding sequence/structure matching between PDB and ProteiNet.

Thanks a lot again for the answer. have you been doing any other works in protein structure prediction? And, nice paper on E-Swish.

@hypnopump
Copy link
Owner

Cool!

  1. Good luck! I would like to see the results!
  2. yup. I took distances between C-alpha for predictions. Idk if there are differences wrt PDB, i'm sorry.

Thanks for the E-swish comment, i did it during my last high school year! Also, what do you think about my comment in the other thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants