-
Notifications
You must be signed in to change notification settings - Fork 88
Open
Labels
devopsDevOps relatedDevOps relateddocumentationImprovements or additions to documentationImprovements or additions to documentation
Description
Summary
Add a developer-oriented RESEARCH.md file at the repository root that documents the internal structure of CausalPy: its main modules, core abstractions, public API surface, typical workflows, and key extension points.
This file is intended as a living "map" of the codebase for developers and contributors (and, secondarily, for code-assist tools), not as user-facing documentation.
Motivation / Why
- New contributors currently need to reverse-engineer the architecture by reading code, tests, and scattered docs.
- There is no single place that explains:
- How the major modules fit together.
- What the core model classes are and how they relate.
- Which parts are considered public API and which are internal.
- A central
RESEARCH.mdwill:- Shorten onboarding time for new contributors.
- Reduce accidental breaking changes to the public API.
- Make it clearer where to add new designs / estimators / backends.
- Provide a stable reference for both humans and AI agents working on the repo.
What should be in RESEARCH.md (high level)
The exact structure can be adjusted, but it should cover at least:
-
Purpose and scope
- What CausalPy is for and the types of causal inference workflows it supports.
- Any explicit non-goals or out-of-scope areas.
-
High-level architecture
- Overview of top-level packages/modules and their responsibilities (with paths, e.g.
causalpy/models/...). - How these modules interact (e.g. models ↔ plotting ↔ datasets, etc.).
- Any important cross-cutting concerns (configuration, randomness, backends).
- Overview of top-level packages/modules and their responsibilities (with paths, e.g.
-
Core abstractions and class structure
- Main user-facing classes and where they live.
- One–two sentence description per key class, plus key methods (e.g.
fit,plot,summary). - Relationships between base classes, concrete models, and result/plotting objects.
-
Public API surface
- What is intended to be imported by users (e.g. what’s re-exported from
causalpy/__init__.py). - "Semi-public" APIs used in examples/notebooks.
- Clear note on what is considered internal and may change without notice.
- What is intended to be imported by users (e.g. what’s re-exported from
-
Workflows and usage patterns
- Main workflows (e.g. regression discontinuity, DiD, synthetic control, etc.).
- For each workflow:
- Key classes/functions involved.
- Expected data shape and assumptions.
- Pointers to example scripts/docs in this repo.
-
Backends and dependencies
- Which modeling/statistical backends are supported.
- How they are plugged in or selected.
- Any important external dependencies and their roles.
-
Extension points
- How to add:
- A new design/estimator.
- A new backend for an existing design.
- Conventions that contributors should follow (naming, required methods, file layout, tests/docs expectations).
- How to add:
-
Testing and quality
- Where tests live and how they are organized.
- Types of tests (unit/integration/etc.).
- Any important fixtures or test helpers.
-
Known limitations and open questions
- Architectural limitations or technical debt that matter for contributors.
- Areas currently in flux or needing refactor, with brief context.
-
Update policy
- When developers should update
RESEARCH.md(e.g. after architectural changes, new designs, or new backends). - Which sections are expected to change frequently vs. rarely.
- When developers should update
Non-goals
- This is not a full user guide or tutorial (that belongs in regular docs).
- It should avoid duplicating detailed API docs; instead it should link/point to them where appropriate.
- It should describe the current reality of the codebase, not a future wish-list design.
Implementation notes
- The work will require:
- Skimming the package structure (
causalpy/),__init__.py, tests, and example notebooks/docs. - Summarizing the findings clearly and concisely.
- Skimming the package structure (
- It is acceptable to use an AI/code assistant to draft
RESEARCH.md, but the final content must be:- Verified against the actual code.
- Corrected where the assistant guesses or hallucinates details.
Acceptance criteria
RESEARCH.mdexists at the repo root.- It contains at least the sections listed above (names can vary, but content should be covered).
- Each described module/class/workflow corresponds to real code in the repository.
- Public vs. internal API boundaries are explicitly documented.
- The document is understandable to a new contributor who knows Python and causal inference but is unfamiliar with this codebase.
Metadata
Metadata
Assignees
Labels
devopsDevOps relatedDevOps relateddocumentationImprovements or additions to documentationImprovements or additions to documentation