Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add antibody developability from TDC #99

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

phalem
Copy link
Contributor

@phalem phalem commented Mar 11, 2023

add Antibody Developability from:
https://tdcommons.ai/single_pred_tasks/develop/
for both:
TAP
SAbDab, Chen et al.

Need some one to ensure two list that I convert to two columns. Thanks

phalem and others added 2 commits March 11, 2023 17:26
add Antibody Developability from:
https://tdcommons.ai/single_pred_tasks/develop/
for both:
TAP
SAbDab, Chen et al.

Need some one to ensure two list that I convert to two columns. Thanks
@phalem phalem mentioned this pull request Mar 11, 2023
Comment on lines 18 to 19
- https://rb.gy/idkdqp
- https://rb.gy/b8cx8i
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With URIs we mean links to ontologies, such as the ones you can find here https://bioportal.bioontology.org/ontologies/BAO?p=classes&conceptid=http://purl.obolibrary.org/obo/NCIT_C20604

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed those as they were not fitting our setup.

Comment on lines 3 to 6
description: "Antibody data from Chen et al, where they process from the SAbDab. \n From an initial dataset of 3816 antibodies, they retained 2426\
\ antibodies\n that satisfy the following criteria: 1. \n have both sequence (FASTA) and Protein Data Bank (PDB) structure files,\n \
\ 2. contain both a heavy chain and a light chain, and 3. \n have crystal structures with resolution < 3 Å. \n The DI label is derived\
\ from BIOVIA's pipelines."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the linebreaks seems a bit awkward, do you have an idea where they come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that was the Ångström Å!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted to nm.

Comment on lines 21 to 22
- id: antibody_pdb_ID
type: Other
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are those IDs chemically meaningful or just some identifier number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the pdb id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we keep them or remove them?

Comment on lines 42 to 50
- "@article{Chen2020,\n doi = {10.1101/2020.06.18.159798},\n url = {https://doi.org/10.1101/2020.06.18.159798},\n year =\
\ {2020},\n month = jun,\n publisher = {Cold Spring Harbor Laboratory},\n author = {Xingyao Chen and Thomas Dougherty and\
\ \n Chan Hong and Rachel Schibler and Yi Cong Zhao and \n Reza Sadeghi and Naim Matasci and Yi-Chieh Wu and Ian Kerman},\n \
\ title = {Predicting Antibody Developability from Sequence \n using Machine Learning}}"
- "@article{Dunbar2013,\n doi = {10.1093/nar/gkt1043},\n url = {https://doi.org/10.1093/nar/gkt1043},\n year = {2013},\n\
\ month = nov,\n publisher = {Oxford University Press ({OUP})},\n volume = {42},\n number = {D1},\n pages\
\ = {D1140--D1146},\n author = {James Dunbar and Konrad Krawczyk and Jinwoo Leem \n and Terry Baker and Angelika Fuchs and Guy Georges\
\ and Jiye Shi and\n Charlotte M. Deane},\n title = {{SAbDab}: the structural antibody database},\n journal = {Nucleic\
\ Acids Research}}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also surprised by the linebreaks here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this also due to the Å? Anyway, fixed!

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot - again 💯 Amazing contributions 👍🏽
I made some comments on one of the files. I think we anyhow wanted to discuss, let me know when you have time

Copy link
Contributor Author

@phalem phalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add benchmark field

@MicPie
Copy link
Contributor

MicPie commented Apr 18, 2023

tap cleanup is incoming, will finish up later

@MicPie MicPie requested a review from kjappelbaum April 27, 2023 12:23
@kjappelbaum
Copy link
Collaborator

I need to understand better if the identifier columns taken alone are enough - I do not think so

@MicPie MicPie added the yaml \ label Apr 27, 2023
@MicPie
Copy link
Contributor

MicPie commented Jun 12, 2023

Discuss: Better use Å instead of nm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants