Regarding input and distance matrix padding. #27

n4ndoz · 2020-05-09T01:58:35Z

Hi!
Wonderful work here, and wonderful code aswell.
I have a few questions regarding your model and some of your input preparation steps.
1- Why do you implemented padding as a new class and not as a mask, by multiplaying every add layer by this binary mask in order to avoid backprop of these regions?
2- Why did you created a different embbeding for the distances, and not only the threshold function?

hypnopump · 2020-05-09T11:17:25Z

Interesting comments.:

I don't know if you mean adding the Padding as a Keras Layer at the beginning of the Net? I wasn't sure how to do that so I just did the padding in NumPy before.
Not sure what you're referring to. In the data preparation functions here: https://github.com/EricAlcaide/MiniFold/blob/master/models/distance_pipeline/distance_generator_data.py I use the same function for padding both the distance and the pssm.

The codebase is from a year and a half ago so I don't have everything in my mind now. If you could clarify what you're referring to, I think I would be able to explain more.

Thanks for the interest in the project!

n4ndoz · 2020-05-09T20:32:47Z

Hi!! Thanks for the quick reply!

I am applying some parts of your model and modifying mainly the res blocks. The main trick is make a binary mask matrix (MaxL*MaxL, i'm using 256, so I can grab the major distribution of proteins in ProteinNet) where a subset LxL for each sequence is 1 and the rest is 0. This way, when you backprop the grads will be 0 where there is no protein info and the error is not propagated. It works? Well, questionable. hahahahahahahahah But it is what Raptor-X-Contact implemented.
I just took a look at the embbeding_matrix function and understood. It pads the dist matrix, right?

Another question is: you did used Alpha Carbons as distance targets, right? You wrote that you applied the Model to ProteinNet, but it doesn't stores Beta Carbon coordinates, only N, Ca, C (CBeta being the "root" of side chain). I'm asking this because I've been trying to fetch the Beta Carbon coordinates from ProteinNet ids and been getting several issues regarding sequence/structure matching between PDB and ProteiNet.

Thanks a lot again for the answer. have you been doing any other works in protein structure prediction? And, nice paper on E-Swish.

hypnopump · 2020-05-10T12:57:15Z

Cool!

Good luck! I would like to see the results!
yup. I took distances between C-alpha for predictions. Idk if there are differences wrt PDB, i'm sorry.

Thanks for the E-swish comment, i did it during my last high school year! Also, what do you think about my comment in the other thread?

hypnopump closed this as completed May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding input and distance matrix padding. #27

Regarding input and distance matrix padding. #27

n4ndoz commented May 9, 2020

hypnopump commented May 9, 2020

n4ndoz commented May 9, 2020

hypnopump commented May 10, 2020

Regarding input and distance matrix padding. #27

Regarding input and distance matrix padding. #27

Comments

n4ndoz commented May 9, 2020

hypnopump commented May 9, 2020

n4ndoz commented May 9, 2020

hypnopump commented May 10, 2020