Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Buchwald Hartwig dataset #81

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pschwllr
Copy link
Contributor

@pschwllr pschwllr commented Mar 8, 2023

Adding meta.yaml , transform.py and example_processing_and_templates.ipynb for Buchwald-Hartwig dataset from https://www.science.org/doi/10.1126/science.aar5169.

Related to #80

@kjappelbaum kjappelbaum linked an issue Mar 9, 2023 that may be closed by this pull request
@phalem phalem mentioned this pull request Mar 9, 2023
name: buchwald_hartwig_doyle
description: High-throughput experimentation palladium-catalyzed Buchwald Hardwig
data set with yields.
targets:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have now also RXN-SMILES identifiers that might be useful to add

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also look into that

- id: reaction_SMILES
type: SMILES
description: SMILES
license: MIT license
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
license: MIT license
license: MIT

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this addition, we should add the rxn-smiles. Should I take care of that?

@pschwllr
Copy link
Contributor Author

pschwllr commented May 6, 2023

How would you add the rxn-smiles?

Currently, the data_clean.csv looks like:

ligand,additive,base,aryl_halide,reaction_SMILES,yield
CC(C)C(C=C(C(C)C)C=C1C(C)C)=C1C2=C(P([C@@]3(C[C@@H]4C5)C[C@H](C4)C[C@H]5C3)[C@]6(C7)C[C@@H](C[C@@H]7C8)C[C@@H]8C6)C(OC)=CC=C2OC,CC1=CC(C)=NO1,CN(C)P(N(C)C)(N(C)C)=NP(N(C)C)(N(C)C)=NCC,ClC1=NC=CC=C1,CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1cc(C)on1.Cc1ccc(N)cc1.Clc1ccccn1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1,70.41045785
CC(C)C(C=C(C(C)C)C=C1C(C)C)=C1C2=C(P([C@@]3(C[C@@H]4C5)C[C@H](C4)C[C@H]5C3)[C@]6(C7)C[C@@H](C[C@@H]7C8)C[C@@H]8C6)C(OC)=CC=C2OC,O=C(OC)C1=CC=NO1,CN(C)P(N(C)C)(N(C)C)=NP(N(C)C)(N(C)C)=NCC,BrC1=NC=CC=C1,Brc1ccccn1.CCN=P(N=P(N(C)C)(N(C)C)N(C)C)(N(C)C)N(C)C.COC(=O)c1ccno1.COc1ccc(OC)c(P([C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)[C@]23C[C@H]4C[C@H](C[C@H](C4)C2)C3)c1-c1c(C(C)C)cc(C(C)C)cc1C(C)C.Cc1ccc(N)cc1.O=S(=O)(O[Pd]1c2ccccc2-c2ccccc2[NH2]1)C(F)(F)F>>Cc1ccc(Nc2ccccn2)cc1,11.06445724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Task: Buchwald-Hartwig yield prediction data
2 participants