Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Epitope data from TDC #96

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

phalem
Copy link
Contributor

@phalem phalem commented Mar 11, 2023

I add Epitope data from tdc here :
https://tdcommons.ai/single_pred_tasks/epitope/

I will need help in validation my approach, need to ensure the indices start with 0 or 1 in epitope active binding one

phalem and others added 2 commits March 11, 2023 12:34
I add Epitope data from tdc here :
https://tdcommons.ai/single_pred_tasks/epitope/

I will need help in validation my approach, need to ensure the indices start with 0 or 1 in epitope active binding one
@phalem phalem mentioned this pull request Mar 11, 2023
@kjappelbaum
Copy link
Collaborator

@phalem Thanks a lot for your efforts; it is really great to have you take the lead on the datasets; we could only have that much progress thanks to your contributions.
One thing with your PRs is that they currently fail in our automatic checks. This is mostly the case because some lines need to be shorter. Could you break long (>120 character) lines to make the content more readable?

Do you think that makes sense? If you want help with that or to discuss this, I'm happy to do so.

@phalem
Copy link
Contributor Author

phalem commented Mar 11, 2023

@phalem Thanks a lot for your efforts; it is really great to have you take the lead on the datasets; we could only have that much progress thanks to your contributions. One thing with your PRs is that they currently fail in our automatic checks. This is mostly the case because some lines need to be shorter. Could you break long (>120 character) lines to make the content more readable?

Thanks kevin for your kind words. I think that problem raise from description that TDC provide. This can be solved by break target description and URI links, other things follow the same workflow. Please, tell me if you have time now to do that during review or I should do by myself. If this doesn't work, we will need to see if description need to be summarized

Copy link
Contributor Author

@phalem phalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add benchmark field

units: ''
type: categorical
names:
- amino acids sequence active in binding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the target is a sequence and the input is the full sequence?

Copy link
Contributor Author

@phalem phalem Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the target is a sequence and the input is the full sequence?

Yeah, I forget to comment on that. Input was sequence and target is sequence as well we could make it categorical, but with a lot of category. I don't know what is your opinion of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is how it is currently implemented. I set it up to only give the corresponding epitope sequence without the _ for the other parts of the sequence.

@kjappelbaum kjappelbaum self-requested a review March 28, 2023 10:14
Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once more: Thanks 💯
(And similar comments)

As data was complex and get in an indirect way. I didn't implement split
@phalem
Copy link
Contributor Author

phalem commented Mar 29, 2023

Before I forget @kjappelbaum we need someone to ensure if index of start with 0 or 1. As it will be different sequences and situation

@MicPie MicPie requested a review from kjappelbaum April 27, 2023 12:14
@kjappelbaum
Copy link
Collaborator

@MicPie we need to carefully look into this. I don't think this is a simple classification/regression task.

@kjappelbaum
Copy link
Collaborator

Also need to check how this overlaps with #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants