Skip to content

Create developer-focused RESEARCH.md for CausalPy #562

@drbenvincent

Description

@drbenvincent

Summary

Add a developer-oriented RESEARCH.md file at the repository root that documents the internal structure of CausalPy: its main modules, core abstractions, public API surface, typical workflows, and key extension points.

This file is intended as a living "map" of the codebase for developers and contributors (and, secondarily, for code-assist tools), not as user-facing documentation.

Motivation / Why

  • New contributors currently need to reverse-engineer the architecture by reading code, tests, and scattered docs.
  • There is no single place that explains:
    • How the major modules fit together.
    • What the core model classes are and how they relate.
    • Which parts are considered public API and which are internal.
  • A central RESEARCH.md will:
    • Shorten onboarding time for new contributors.
    • Reduce accidental breaking changes to the public API.
    • Make it clearer where to add new designs / estimators / backends.
    • Provide a stable reference for both humans and AI agents working on the repo.

What should be in RESEARCH.md (high level)

The exact structure can be adjusted, but it should cover at least:

  1. Purpose and scope

    • What CausalPy is for and the types of causal inference workflows it supports.
    • Any explicit non-goals or out-of-scope areas.
  2. High-level architecture

    • Overview of top-level packages/modules and their responsibilities (with paths, e.g. causalpy/models/...).
    • How these modules interact (e.g. models ↔ plotting ↔ datasets, etc.).
    • Any important cross-cutting concerns (configuration, randomness, backends).
  3. Core abstractions and class structure

    • Main user-facing classes and where they live.
    • One–two sentence description per key class, plus key methods (e.g. fit, plot, summary).
    • Relationships between base classes, concrete models, and result/plotting objects.
  4. Public API surface

    • What is intended to be imported by users (e.g. what’s re-exported from causalpy/__init__.py).
    • "Semi-public" APIs used in examples/notebooks.
    • Clear note on what is considered internal and may change without notice.
  5. Workflows and usage patterns

    • Main workflows (e.g. regression discontinuity, DiD, synthetic control, etc.).
    • For each workflow:
      • Key classes/functions involved.
      • Expected data shape and assumptions.
      • Pointers to example scripts/docs in this repo.
  6. Backends and dependencies

    • Which modeling/statistical backends are supported.
    • How they are plugged in or selected.
    • Any important external dependencies and their roles.
  7. Extension points

    • How to add:
      • A new design/estimator.
      • A new backend for an existing design.
    • Conventions that contributors should follow (naming, required methods, file layout, tests/docs expectations).
  8. Testing and quality

    • Where tests live and how they are organized.
    • Types of tests (unit/integration/etc.).
    • Any important fixtures or test helpers.
  9. Known limitations and open questions

    • Architectural limitations or technical debt that matter for contributors.
    • Areas currently in flux or needing refactor, with brief context.
  10. Update policy

    • When developers should update RESEARCH.md (e.g. after architectural changes, new designs, or new backends).
    • Which sections are expected to change frequently vs. rarely.

Non-goals

  • This is not a full user guide or tutorial (that belongs in regular docs).
  • It should avoid duplicating detailed API docs; instead it should link/point to them where appropriate.
  • It should describe the current reality of the codebase, not a future wish-list design.

Implementation notes

  • The work will require:
    • Skimming the package structure (causalpy/), __init__.py, tests, and example notebooks/docs.
    • Summarizing the findings clearly and concisely.
  • It is acceptable to use an AI/code assistant to draft RESEARCH.md, but the final content must be:
    • Verified against the actual code.
    • Corrected where the assistant guesses or hallucinates details.

Acceptance criteria

  • RESEARCH.md exists at the repo root.
  • It contains at least the sections listed above (names can vary, but content should be covered).
  • Each described module/class/workflow corresponds to real code in the repository.
  • Public vs. internal API boundaries are explicitly documented.
  • The document is understandable to a new contributor who knows Python and causal inference but is unfamiliar with this codebase.

Metadata

Metadata

Assignees

Labels

devopsDevOps relateddocumentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions