Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tcr epitope binding dataset #67

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

csjackson0
Copy link

@csjackson0 csjackson0 commented Mar 4, 2023

I added a meta.yaml , transform.py and example_processing_and_templates.ipynb for the TCR epitope binding data found at TDC commons. It is a dataset that contains epitope (SMILES and amino acid sequence) and TCR (amino acid sequence) pairs. For each pair there is a binary label for binding. The data is used in the Weber et al. paper.

Comment on lines 13 to 15
- tcr binding affinity
- binding affinity
- binding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be better if we could include in all the "synonyms" also the binding site, e.g., "epitope binding affinity"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjappelbaum I included "epitope binding affinity" and also "epitope binding" as synonyms.

Comment on lines 17 to 25
- id: epitope_smiles
type: SMILES
description: 'epitope smiles '
- id: epitope_aa
type: amino acid
description: epitope amino acid sequence
- id: tcr_aa
type: amino acid
description: tcr amino acid sequence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand the dataset correctly that the binding only makes sense if we specify both the TCR and the Epitope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right. Given the epitope and TCR, predict if the pair binds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we will need to add templates to sample this data correctly. There are examples for the templates in the Contribution Guide. Let me know if you want some hand with this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjappelbaum Thanks for the feedback, I attempted to add a template. However, I am not sure if I fully understand what to do here. Can you please have a look and provide some help on this?

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution 💯
I think I do not fully understand the dataset yet, perhaps you can help me?

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution. Before we merge, we should add the templates for sampling, as mentioned in one of my comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants