dowhy replication issue and output discrepancy with statsmodels #1227

dododobetter · 2024-07-16T13:32:53Z

Ask your question

Hi all, I'm new to the GitHub community and Python. I have two questions regarding the dowhy package:

Output Replicability Issue:
The estimate from dowhy appears to vary slightly each time I run it. I've tried setting random seeds without success. How can I ensure that the output is consistent and replicable?
Discrepancy with Statsmodels:
I've noticed significant differences between the treatment effect estimates (ATE, ATT, ATC) obtained from dowhy and those generated by statsmodels. Both methods use the same identification approach (propensity score matching & probit model). Can anyone provide guidance on resolving this discrepancy?

Expected behavior

DoWhy output should be replicable (i.e. exactly the same value every time)
The output should be very similar to statsmodels/Stata/R outputs.

Version information:

DoWhy version [0.11.1]

Additional context

My codes below for reference:

# Dataset loading

cur_dir = os.path.abspath(os.path.dirname(res_st.__file__))
file_name = 'cataneo2.csv'
file_path = os.path.join(cur_dir, file_name)
dta_cat = pd.read_csv(file_path)
methods = ['ra', 'ipw', 'aipw', 'aipw_wls', 'ipw_ra']
methods_st = [
    ("ra", res_st.results_ra),
    ("ipw", res_st.results_ipw),
    ("aipw", res_st.results_aipw),
    ("aipw_wls", res_st.results_aipw_wls),
    ("ipw_ra", res_st.results_ipwra),
]
pd.set_option('display.width', 500)
dta_cat.head()

# Statsmodels approach

# Treatment selection model: probit model
formula = 'mbsmoke_ ~ mmarried_ + mage + mage2 + fbaby_ + medu'
res_probit = Probit.from_formula(formula, dta_cat).fit()  # Estimate the probability of smoking

# Outcome model: OLS model
formula_outcome = 'bweight ~ prenatal1_ + mmarried_ + mage + fbaby_'
mod = OLS.from_formula(formula_outcome, dta_cat)

# Treatment indicator variable
tind = np.asarray(dta_cat['mbsmoke_'])  # Converts the treatment indicator variable (mbsmoke_) from the DataFrame to a NumPy array.
teff = TreatmentEffect(mod, tind, results_select=res_probit)

res = teff.ipw()  # Compute POM and ATE using inverse probability weighting
print("Results from Statsmodels (ATE):", res)

teff.ipw(effect_group=1)  # Average Treatment Effect on Treated
teff.ipw(effect_group=0)  # ATE on untreated

# DoWhy approach

np.random.seed(42)

model = CausalModel(
    data=dta_cat,
    treatment='mbsmoke_',
    outcome='bweight',
    common_causes=['mmarried_', 'mage', 'mage2', 'fbaby_', 'medu', 'prenatal1_']
)

identified_estimand = model.identify_effect()
print("Identified Estimand from DoWhy:", identified_estimand)

ATE = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),  # Ensure it uses Logistic Regression
        'matching_algorithm': 'nearest_neighbor',  # Ensure similar matching algorithm
        'n_neighbors': 1  # Default is 1-to-1 matching, similar to pairwise matching
    }
)
print("ATE from DoWhy:", ATE.value)

ATT = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),
        'matching_algorithm': 'nearest_neighbor',
        'n_neighbors': 1
    },
    target_units='att'  # Focus on treated units
)
print("ATT from DoWhy:", ATT.value)

ATC = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),
        'matching_algorithm': 'nearest_neighbor',
        'n_neighbors': 1
    },
    target_units='atc'  # Focus on untreated units
)
print("ATU from DoWhy:", ATC.value)

refutation = model.refute_estimate(
    identified_estimand,
    ATE,
    method_name='placebo_treatment_refuter'
)
print("Refutation result from DoWhy:", refutation)

The text was updated successfully, but these errors were encountered:

github-actions · 2024-07-31T01:48:48Z

This issue is stale because it has been open for 14 days with no activity.

github-actions · 2024-08-07T01:54:34Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

dododobetter added the question Further information is requested label Jul 16, 2024

github-actions bot added the stale label Jul 31, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dowhy replication issue and output discrepancy with statsmodels #1227

dowhy replication issue and output discrepancy with statsmodels #1227

dododobetter commented Jul 16, 2024

github-actions bot commented Jul 31, 2024

github-actions bot commented Aug 7, 2024

dowhy replication issue and output discrepancy with statsmodels #1227

dowhy replication issue and output discrepancy with statsmodels #1227

Comments

dododobetter commented Jul 16, 2024

github-actions bot commented Jul 31, 2024

github-actions bot commented Aug 7, 2024