Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uspto data from drfp #95

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

phalem
Copy link
Contributor

@phalem phalem commented Mar 10, 2023

Add uspto raw from drfp until I finish uspto from tdc

@phalem phalem mentioned this pull request Mar 10, 2023
Comment on lines 12 to 13
- https://bioportal.bioontology.org/ontologies/AFO?p=classes&conceptid=http%3A%2F%2Fpurl.allotrope.org%2Fontologies%2Fquality%23AFQ_0000227
- https://en.wikipedia.org/wiki/Yield_(chemistry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the "id" in the ontology table, but I can show you at an example when we discuss

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjappelbaum I'm not sure what you mean here?

Copy link
Contributor Author

@phalem phalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add benchmark field

link: https://tdcommons.ai/
split_column: split
identifiers:
- id: reaction_SMILES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a new entry for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great I see it. I will edit the file and PR it again.

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as for the other PRs :)

phalem and others added 6 commits March 28, 2023 20:43
Co-authored-by: Kevin M Jablonka <[email protected]>
I will add benchmark field on TDC version UPSTO
I will add benchmark field on TDC version UPSTO
I will add benchmark field on TDC version UPSTO
@MicPie
Copy link
Contributor

MicPie commented Apr 14, 2023

I split up the reaction, i.e., "catalyst", "reactant", "product", into different columns.
This should give us more freedom with the prompt template setup.
@kjappelbaum Is this a good idea? Should we create the reaction SMILES column again or not?

@kjappelbaum
Copy link
Collaborator

@MicPie, yes, I'd add reaction SMILES as this is the best hope to remove duplicates

@MicPie
Copy link
Contributor

MicPie commented May 3, 2023

As I'm coming from the bio side, wouldn't we need to more info for a reaction smiles or is it always:
1 reactant + 1 catalyst = 1 product ?

@pschwllr
Copy link
Contributor

pschwllr commented May 7, 2023

I'm not sure what data is TDC yields. They are not very specific in their documentation: https://tdcommons.ai/single_pred_tasks/yields/#uspto

On the other hand, I know the data from drfp well.
https://github.com/reymond-group/drfp/tree/main/data

The above/below yield datasets come from this paper: https://iopscience.iop.org/article/10.1088/2632-2153/abc81d.

Currently, things seem to be a bit mixed up in this pull request.

@pschwllr
Copy link
Contributor

pschwllr commented May 7, 2023

As I'm coming from the bio side, wouldn't we need to more info for a reaction smiles or is it always:
1 reactant + 1 catalyst = 1 product ?

You are right. There can be plenty of reactants, reagents, solvents, and catalysts leading to one or more products in a reaction SMILES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants