stochastictrees.qmd

---
title: "Stochastic Trees"
share:
  permalink: "https://book.martinez.fyi/stochastictrees.html"
  description: "Business Data Science: What Does it Mean to Be Data-Driven?"
  linkedin: true
  email: true
  mastodon: true
---

This section will delve into a specific family of non-parametric models that has
gained considerable traction in the causal inference world: Bayesian Additive
Regression Trees (BART) and some of its notable derivatives. These models offer
a unique blend of flexibility and interpretability, making them particularly
well-suited for causal inference tasks.

Before we dive into the specifics of these models, it's crucial to understand
the fundamental assumptions that underpin their use in causal inference. Like
many causal inference methods, BART and its derivatives rely on two key
assumptions: the Stable Unit Treatment Value Assumption (SUTVA) and strong
ignorability.

1.  **Stable Unit Treatment Value Assumption (SUTVA):** SUTVA comprises two
    parts:

    a. **No interference:** The treatment applied to one unit doesn't influence 
    the outcomes of other units. In a business context, a marketing campaign 
    targeted at one customer wouldn't sway the purchasing decisions of others.
  
    b. **No hidden variations of treatments:** There's only one version of
    each treatment level. If we're studying the effect of a new
    training program, all employees receive the same version of it.

2.  **Strong Ignorability:** This assumption consists of two components:

    a. **Unconfoundedness:** Given the observed covariates, treatment 
    assignment is independent of potential outcomes. In essence, if we account
    for all relevant variables, whether a unit receives treatment or not 
    is unrelated to their outcomes under either condition.
  
    b. **Positivity (or overlap):** Every unit has a non-zero probability of
    receiving each treatment level. In a business setting, this means 
    every customer has some chance of being exposed to a new marketing
    campaign, regardless of their characteristics.
    
These assumptions are the bedrock upon which we build causal interpretations.
When these conditions hold, we can attribute the observed
differences in outcomes between treated and untreated units to the treatment
itself, rather than to confounding factors.  

We'll explore three key models in this family: BART, Bayesian Causal Forests
(BCF), and LongBet. Each of these models builds upon its predecessors, offering
improvements in terms of causal effect estimation, handling of confounding, and
applicability to different data structures.

In our exploration, we'll be leveraging the stochtree R package
[@stochastictree], which implements these models using a technique called
"warm-start" as introduced by @krantsevich23a. The warm-start approach is a
computational innovation that significantly improves the efficiency and
effectiveness of these models, particularly for large datasets.

The warm-start technique works by using a fast approximation method (XBART) to
generate initial tree structures, which are then used as starting points for the
full Bayesian MCMC algorithm. This approach combines the speed of approximate
methods with the statistical rigor of full Bayesian inference, resulting in
models that are both computationally efficient and statistically robust.

By using warm-start, we can fit these sophisticated models to larger datasets
and explore more complex causal relationships than was previously feasible. This
makes these models particularly valuable for business data science applications,
where we often deal with large, complex datasets and need to uncover nuanced
causal relationships.

Let's begin our journey into the world of stochastic trees and their
applications in causal inference, keeping in mind the critical assumptions that
allow us to draw causal conclusions from these powerful tools.

::: {.callout-tip}
## Learn more
  - @krantsevich23a Stochastic Tree Ensembles for Estimating Heterogeneous
    Effects.
  - @stochastictree Stochastic tree ensembles (XBART and BART) for supervised
    learning and causal inference.
:::