Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New fixes #522

Merged
merged 33 commits into from
Feb 8, 2024
Merged

New fixes #522

merged 33 commits into from
Feb 8, 2024

Conversation

MicPie
Copy link
Contributor

@MicPie MicPie commented Jan 29, 2024

No description provided.

This was referenced Jan 29, 2024
@MicPie MicPie marked this pull request as ready for review January 30, 2024 08:09
kjappelbaum and others added 2 commits January 31, 2024 12:43
* fix: use SMILES__description instead of SMILES__names__noun for mol. repr. sampling

* fix: mol_repr_transl/transform.py w/o SMILES with H and with split col only

* fix: exclude uniprot binding ..

* fix: missing ! and wrong variable name in template

* fix: missing ! in templates

* fix: and rheadb and sort exclude_from_standard_tabular_text_templates

* fix: RedDB templates and fully export to meta.yaml

* fix: aminoacids templates var name

* fix: polymer_data templates var name

* fix: var dtypes

* fix: name and id

* fix: add rdkit_features to /exclude_from_standard_tabular_text_templates

* fix: # missing in template var

* fix: # missing in template var 2

* fix: fix dialogue template in compound_protein_compound_*

* fix: QC templates 1

* update reddb

* update zhu

* add description of representation

* orexin1_receptor_butkiewicz

* smiles__description

* molecule with SMILES

* fix: QC templates 2

* fix: QC templates 3

* fix: QC templates 3

* fix: QC templates

* fix: QC templates

* fix: QC templates

* fix: QC templates

* add organism

* make explicit

* add representation name

* add representation name

* update standard templates

* add representation name

* add representation names

* representation name use

* smiles usage

* must to

* add representation name

* Update data/kg/compound_protein_protein/meta.yaml

Co-authored-by: Michael Pieler <[email protected]>

* Update data/tabular/freesolv/meta.yaml

Co-authored-by: Michael Pieler <[email protected]>

* Update data/tabular/freesolv/transform.py

Co-authored-by: Michael Pieler <[email protected]>

* Update data/tabular/freesolv/transform.py

Co-authored-by: Michael Pieler <[email protected]>

* Update data/tabular/sr_p53_tox21/meta.yaml

Co-authored-by: Michael Pieler <[email protected]>

* Update data/kg/compound_protein_protein/meta.yaml

* Update data/kg/compound_protein_protein/meta.yaml

* Update data/kg/compound_protein_protein/meta.yaml

* Update data/tabular/freesolv/meta.yaml

* Update data/tabular/sr_p53_tox21/meta.yaml

* Update data/kg/compound_protein_compound_1/meta.yaml

* Update data/kg/compound_protein_compound_1/meta.yaml

* Update data/kg/compound_protein_compound_1/meta.yaml

* Update data/kg/compound_protein_compound_1/meta.yaml

* Update data/kg/compound_protein_compound_1/meta.yaml

* Update data/kg/compound_protein_compound_3/meta.yaml

* Update data/kg/compound_protein_compound_3/meta.yaml

* Update data/kg/compound_protein_compound_3/meta.yaml

* Update data/kg/compound_protein_compound_3/meta.yaml

* Update data/kg/compound_protein_compound_3/meta.yaml

* Update data/kg/compound_protein_go_term_1/meta.yaml

* Update data/kg/compound_protein_go_term_1/meta.yaml

* Update data/kg/compound_protein_go_term_2/meta.yaml

* Update data/kg/compound_protein_go_term_2/meta.yaml

* Update data/kg/compound_protein_go_term_3/meta.yaml

* Update data/kg/compound_protein_go_term_3/meta.yaml

* Update data/kg/compound_protein_go_term_4/meta.yaml

* Update data/kg/compound_protein_go_term_4/meta.yaml

* Update data/kg/compound_protein_pathway_disease_2/meta.yaml

* Update data/kg/drug_protein_hpo_disease/meta.yaml

* Update data/tabular/chemcaption_rdkit/meta.yaml

* Update data/tabular/mona/meta.yaml

* Update data/tabular/mona/transform.py

---------

Co-authored-by: Michael Pieler <[email protected]>
Co-authored-by: Kevin Maik Jablonka <[email protected]>
Co-authored-by: Michael Pieler <[email protected]>
Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, you put a lot of effort into it!

data/check_pandas.py Show resolved Hide resolved
data/check_pandas.py Show resolved Hide resolved
data/check_pandas.py Outdated Show resolved Hide resolved
data/check_pandas.py Outdated Show resolved Hide resolved
data/check_smiles_split.py Outdated Show resolved Hide resolved
data/natural/preprocess_msds.py Show resolved Hide resolved
data/natural/preprocess_msds.py Show resolved Hide resolved
data/natural/preprocess_nougat.py Show resolved Hide resolved
data/postprocess_split.py Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does not work and this meta.yaml file should be removed from your PR.

I made a PR at a prior point (https://github.com/OpenBioML/chemnlp/pull/517/files#diff-bcd01efb49cb5331fcde8a5a8bd43bb5addcc228c8a5b947875f19addf7e28f4), where this was fixed because some of the data does not exist for all the polymers. Hence, the sampling would throw up. With the changes in this PR it will break again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll then use the state from the linked file, or is this suboptimal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you remove it from this PR it should be fine.

If you do not want to drop it from the PR, you can copy/paste the current version on main

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can resolve this conversation?

MicPie and others added 7 commits February 1, 2024 09:32
@MicPie
Copy link
Contributor Author

MicPie commented Feb 1, 2024

In data/tabular/bicerano_dataset/meta.yaml is missing the target glass_CTE_calc.

@MicPie
Copy link
Contributor Author

MicPie commented Feb 1, 2024

Fixed
BACE template 4 (python index) is also broken because it misses the target in the meta.yaml file:

PromptTemplate: Task: Please {#give me|create|generate!} a {#molecule |!}{SMILES__description} based on the {#text |!}description{# below|!}.
Description: A molecule that is {BACE_inhibition__names__adjective}.
Result: {SMILES#}

@MicPie
Copy link
Contributor Author

MicPie commented Feb 1, 2024

Fixed
Same for BBBP, template 4 (python index) is also broken because it misses the target in the template:

PromptTemplate: Task: Please {#give me|create|generate!} a {#molecule |!}{SMILES__description} based on the {#text |!}description{# below|!}.
Description: A molecule that is {p_np__names__adjective}.
Result: {SMILES#}

@MicPie
Copy link
Contributor Author

MicPie commented Feb 1, 2024

Fixed
Same again for ames_mutagenicity template 6:

Task: Please {#give me|create|generate!} a {#molecule |!}{SMILES__description} based on the {#text |!}description{# below|!}.
Description: A molecule that is {mutagenic__names__adjective}.
Result: {SMILES#}

@kjappelbaum
Copy link
Collaborator

kjappelbaum commented Feb 1, 2024

In data/tabular/bicerano_dataset/meta.yaml is missing the target glass_CTE_calc.

yes, some of those no-Tg values (e.g. CTE, experimental density) are missing for some materials.

@MicPie MicPie requested a review from AdrianM0 February 5, 2024 17:09
those changes are incorrect, the CTE and density are not there for all polymers
Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a few comments (I do not know how to be verify the boolean tasks without digging into the original data, which I didn't do yet)

Copy link
Contributor Author

@MicPie MicPie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor Author

@MicPie MicPie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor changes

Copy link
Collaborator

@kjappelbaum kjappelbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kjappelbaum
Copy link
Collaborator

@MicPie do you want to merge or shall I?

@MicPie MicPie merged commit 768f131 into main Feb 8, 2024
3 checks passed
@MicPie MicPie deleted the new_fixes branch February 8, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants