Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MolSkillScorer.score returns nan for "unusual" atom types #3

Open
PatWalters opened this issue Feb 28, 2023 · 2 comments
Open

MolSkillScorer.score returns nan for "unusual" atom types #3

PatWalters opened this issue Feb 28, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@PatWalters
Copy link

This is a follow-up to the previous issue I reported. It turns out that the problem isn't with molecules that have multiple fragments. The problem is that a few of the RDKit descriptors return nan when they encounter atom types that are not parameterized. The impacted descriptors are listed below. I'm willing to bet you could remove these from the descriptors you're currently using without impacting performance. Then again, these molecules are probably outside your applicability domain, and MolSkillScorer.score should return nan.

from rdkit.Chem.Descriptors import BCUT2D_MWHI, MaxPartialCharge
from rdkit import Chem

a = BCUT2D_MWHI(Chem.MolFromSmiles("CCC[Se]CCC"))
b = MaxPartialCharge(Chem.MolFromSmiles("CCC[Se]CCC"))
a,b
(nan, nan)

Here are the problematic descriptors

BCUT2D_MWHI
BCUT2D_MWLOW
BCUT2D_CHGHI
BCUT2D_CHGLO
BCUT2D_LOGPHI
BCUT2D_LOGPLOW
BCUT2D_MRHI
BCUT2D_MRLOW
MaxPartialCharge
MinPartialCharge
MaxAbsPartialCharge
MinAbsPartialCharge

@josejimenezluna josejimenezluna self-assigned this Feb 28, 2023
@josejimenezluna josejimenezluna added the bug Something isn't working label Feb 28, 2023
@josejimenezluna
Copy link
Contributor

Hi @PatWalters. Many thanks for identifying the problematic descriptors!

While I believe these molecules are for sure outside of the applicability domain, I feel that is up to the user to be conscious of this rather than us returning nan values.

I'll go check whether removing these descriptors impacts performance in any significant way and remove them from the default featurizer/model if it is not the case.

@SejeongPark8354
Copy link

Hi @josejimenezluna.
Could you please provide an update on the effects of removing these specific descriptors from the default featurizer/model?
I am interested in knowing if there have been any recent findings or observations regarding the impact this has on the performance of the analysis or model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants