-
Notifications
You must be signed in to change notification settings - Fork 1
Draft for JOSS submission #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
TomeHirata
wants to merge
2
commits into
main
Choose a base branch
from
feat/joss-paper
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
@misc{byambadalai2024estimatingdistributionaltreatmenteffects, | ||
title={Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction}, | ||
author={Undral Byambadalai and Tatsushi Oka and Shota Yasui}, | ||
year={2024}, | ||
eprint={2407.16037}, | ||
archivePrefix={arXiv}, | ||
primaryClass={econ.EM}, | ||
url={https://arxiv.org/abs/2407.16037}, | ||
} | ||
|
||
@book{fisher1935design, | ||
title={The Design of Experiments}, | ||
author={Fisher, Ronald A.}, | ||
year={1935}, | ||
publisher={Oliver and Boyd} | ||
} | ||
|
||
@ARTICLE{2020NumPy-Array, | ||
author = {Harris, Charles R. and Millman, K. Jarrod and | ||
van der Walt, Stéfan J and Gommers, Ralf and | ||
Virtanen, Pauli and Cournapeau, David and | ||
Wieser, Eric and Taylor, Julian and Berg, Sebastian and | ||
Smith, Nathaniel J. and Kern, Robert and Picus, Matti and | ||
Hoyer, Stephan and van Kerkwijk, Marten H. and | ||
Brett, Matthew and Haldane, Allan and | ||
Fernández del Río, Jaime and Wiebe, Mark and | ||
Peterson, Pearu and Gérard-Marchant, Pierre and | ||
Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and | ||
Abbasi, Hameer and Gohlke, Christoph and | ||
Oliphant, Travis E.}, | ||
title = {Array programming with {NumPy}}, | ||
journal = {Nature}, | ||
year = {2020}, | ||
volume = {585}, | ||
pages = {357–362}, | ||
doi = {10.1038/s41586-020-2649-2} | ||
} | ||
|
||
@article{scikit-learn, | ||
title={Scikit-learn: Machine Learning in {P}ython}, | ||
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. | ||
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. | ||
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and | ||
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, | ||
journal={Journal of Machine Learning Research}, | ||
volume={12}, | ||
pages={2825--2830}, | ||
year={2011} | ||
} | ||
|
||
@misc{byambadalai2025efficientestimationdistributionaltreatment, | ||
title={On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization}, | ||
author={Undral Byambadalai and Tatsushi Oka and Shota Yasui}, | ||
year={2025}, | ||
eprint={2506.05945}, | ||
archivePrefix={arXiv}, | ||
primaryClass={econ.EM}, | ||
url={https://arxiv.org/abs/2506.05945} | ||
} | ||
|
||
@misc{hirata2025efficientscalableestimationdistributional, | ||
title={Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks}, | ||
author={Tomu Hirata and Undral Byambadalai and Tatsushi Oka and Shota Yasui}, | ||
year={2025}, | ||
eprint={2507.07738}, | ||
archivePrefix={arXiv}, | ||
primaryClass={econ.EM}, | ||
url={https://arxiv.org/abs/2507.07738} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
--- | ||
title: 'dte_adj: A Python Package for Estimating Distributional Treatment Effects in Randomized Experiments' | ||
tags: | ||
- Python | ||
- randomized experiments | ||
- causal inference | ||
- distributional treatment effects | ||
- machine learning | ||
- variance reduction | ||
authors: | ||
- name: Tomu Hirata | ||
orcid: 0009-0006-3140-291X | ||
equal-contrib: true | ||
affiliation: "1, 3" | ||
- name: Undral Byambadalai | ||
corresponding: true | ||
affiliation: 1 | ||
- name: Tatsushi Oka | ||
corresponding: true | ||
affiliation: "1, 2" | ||
- name: Shota Yasui | ||
corresponding: true | ||
affiliation: 1 | ||
affiliations: | ||
- name: CyberAgent, Inc., Japan | ||
index: 1 | ||
- name: Keio University, Japan | ||
index: 2 | ||
- name: Databricks Japan, Japan | ||
index: 3 | ||
date: 24 August 2025 | ||
bibliography: paper.bib | ||
--- | ||
|
||
# Summary | ||
|
||
`dte_adj` is a Python package designed for estimating distributional treatment effects (DTEs) in randomized experiments. Unlike traditional approaches that focus on average treatment effects, `dte_adj` enables researchers to analyze the full distributional impact of interventions across different outcome levels. The package implements machine learning-enhanced regression adjustment methods to achieve variance reduction, making distributional effect estimation more precise and computationally efficient. It supports multiple experimental designs including simple randomization, covariate-adaptive randomization (CAR), and local distributional treatment effect (LDTE) estimation. The package provides a scikit-learn compatible API and comprehensive functionality for computing distribution functions, probability treatment effects, and quantile treatment effects with confidence intervals. | ||
|
||
# Statement of Need | ||
|
||
Randomized experiments have been fundamental to scientific inquiry since the pioneering work of @Fisher:1935, providing the gold standard for causal inference. While most experimental analyses focus on average treatment effects (ATEs), many research questions require understanding how treatments affect the entire distribution of outcomes, not just the mean. Distributional treatment effects (DTEs) capture these richer patterns, revealing heterogeneous impacts across different outcome levels that averages can mask. | ||
|
||
Despite the growing importance of distributional analysis in fields ranging from economics to medicine, the Python ecosystem lacks comprehensive tools for DTE estimation. While SciPy provides basic empirical cumulative distribution functions, it offers no specialized functionality for treatment effect estimation, variance reduction, or confidence interval construction in experimental settings. Existing R packages like `RDDtools` focus on regression discontinuity rather than randomized experiments, and lack modern machine learning integration. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 比較するライブラリとしてはDoWhyやEconMLを追加すると良いかと思いました。EconMLは機械学習取り込んでますが、distributionalな話はしてないという言い方ができるかと思います。 |
||
|
||
`dte_adj` addresses this gap by providing a comprehensive Python framework for distributional treatment effect analysis. The package implements state-of-the-art variance reduction techniques using machine learning models for regression adjustment [@byambadalai2024estimatingdistributionaltreatmenteffects], enabling more precise DTE estimates with smaller sample sizes. It supports multiple experimental designs including covariate-adaptive randomization [@byambadalai2025efficientestimationdistributionaltreatment] and local treatment effects, with a scikit-learn [@scikit-learn] compatible API that integrates seamlessly into existing machine learning workflows. This makes advanced distributional analysis accessible to the broader Python research community, supporting more nuanced causal inference in experimental studies. | ||
|
||
# Features | ||
|
||
`dte_adj` provides a comprehensive suite of tools for distributional treatment effect analysis: | ||
|
||
## Estimator Classes | ||
|
||
The package implements multiple estimator classes following a hierarchical design pattern: | ||
|
||
**Simple Randomization Estimators:** | ||
- `SimpleDistributionEstimator`: Basic empirical distribution function estimator for simple randomized experiments | ||
- `AdjustedDistributionEstimator`: Machine learning-enhanced estimator with regression adjustment for variance reduction | ||
|
||
**Stratified Estimators (for Covariate-Adaptive Randomization):** | ||
- `SimpleStratifiedDistributionEstimator`: Handles stratified block randomization designs | ||
- `AdjustedStratifiedDistributionEstimator`: Combines stratification with ML-based variance reduction | ||
|
||
**Local Distribution Estimators:** | ||
- `SimpleLocalDistributionEstimator`: Estimates local distributional treatment effects (LDTE) | ||
- `AdjustedLocalDistributionEstimator`: LDTE estimation with ML adjustment for improved precision | ||
|
||
## Core Methods | ||
|
||
All estimators implement a consistent API with three primary methods: | ||
|
||
- `predict_dte()`: Computes Distributional Treatment Effects $DTE_{w, w'}(y) := F_{Y(w)}(y) - F_{Y(w')}(y)$, where $F_{Y(w)}(y)$ represents the cumulative distribution function for treatment $w$ at outcome level $y$. | ||
|
||
- `predict_pte()`: Computes Probability Treatment Effects over specified intervals, measuring differences in probability mass between treatment groups. | ||
|
||
- `predict_qte()`: Computes Quantile Treatment Effects $QTE_{w, w'}(\tau) := F_{Y(w)}^{-1}(\tau) - F_{Y(w')}^{-1}(\tau)$, comparing quantiles across treatments. | ||
|
||
## Advanced Features | ||
|
||
**Multi-task Learning:** The package supports multi-task neural networks (`is_multi_task=True`) for computational efficiency when analyzing many outcome locations simultaneously [@hirata2025efficientscalableestimationdistributional]. | ||
|
||
**Cross-fitting:** Adjusted estimators use K-fold cross-fitting to prevent overfitting in machine learning models, ensuring robust treatment effect estimates. | ||
|
||
**Confidence Intervals:** Built-in bootstrap methods provide confidence intervals with multiple variance estimation approaches (`moment`, `simple`, `uniform`). | ||
|
||
**Visualization:** The `dte_adj.plot` module enables easy plotting of treatment effects and confidence bands. | ||
|
||
 | ||
 | ||
 | ||
|
||
# Acknowledgements | ||
|
||
We thank CyberAgent, Inc. for supporting this research and the open-source community for valuable feedback during development. | ||
|
||
# References |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
読者層的にはA/Bテストっていう言い方しかわからない人もいるかも?と思いました。
randomized experiments (RCTs, also known as A/B tests) とかはどうでしょうか?