You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow-up to the previous issue I reported. It turns out that the problem isn't with molecules that have multiple fragments. The problem is that a few of the RDKit descriptors return nan when they encounter atom types that are not parameterized. The impacted descriptors are listed below. I'm willing to bet you could remove these from the descriptors you're currently using without impacting performance. Then again, these molecules are probably outside your applicability domain, and MolSkillScorer.score should return nan.
from rdkit.Chem.Descriptors import BCUT2D_MWHI, MaxPartialCharge
from rdkit import Chem
a = BCUT2D_MWHI(Chem.MolFromSmiles("CCC[Se]CCC"))
b = MaxPartialCharge(Chem.MolFromSmiles("CCC[Se]CCC"))
a,b
(nan, nan)
Hi @PatWalters. Many thanks for identifying the problematic descriptors!
While I believe these molecules are for sure outside of the applicability domain, I feel that is up to the user to be conscious of this rather than us returning nan values.
I'll go check whether removing these descriptors impacts performance in any significant way and remove them from the default featurizer/model if it is not the case.
Hi @josejimenezluna.
Could you please provide an update on the effects of removing these specific descriptors from the default featurizer/model?
I am interested in knowing if there have been any recent findings or observations regarding the impact this has on the performance of the analysis or model.
This is a follow-up to the previous issue I reported. It turns out that the problem isn't with molecules that have multiple fragments. The problem is that a few of the RDKit descriptors return nan when they encounter atom types that are not parameterized. The impacted descriptors are listed below. I'm willing to bet you could remove these from the descriptors you're currently using without impacting performance. Then again, these molecules are probably outside your applicability domain, and MolSkillScorer.score should return nan.
Here are the problematic descriptors
BCUT2D_MWHI
BCUT2D_MWLOW
BCUT2D_CHGHI
BCUT2D_CHGLO
BCUT2D_LOGPHI
BCUT2D_LOGPLOW
BCUT2D_MRHI
BCUT2D_MRLOW
MaxPartialCharge
MinPartialCharge
MaxAbsPartialCharge
MinAbsPartialCharge
The text was updated successfully, but these errors were encountered: