Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,12 @@ StrykerOutput/

# Upstream mirror state is regeneratable from
# `references/reference-sources.json` via the sync script.
# Do not commit it.
references/upstreams/
# Do not commit it. Sentinel pair (`.gitignore` + `README.md`)
# is tracked so the directory exists on clone and contributors
# see what it's for, parallel to `drop/` and `roms/`.
references/upstreams/*
!references/upstreams/.gitignore
!references/upstreams/README.md

# Lean 4 + Mathlib build artifacts. Generated by `lake build`.
# Mathlib alone is ~6.8 GB of .olean; never commit.
Expand Down
3 changes: 3 additions & 0 deletions references/upstreams/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!.gitignore
!README.md
78 changes: 78 additions & 0 deletions references/upstreams/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# `references/upstreams/` — gitignored upstream-source mirror

This directory is the local checkout of every upstream source listed in
[`references/reference-sources.json`](../reference-sources.json). It is
**gitignored except for this README and `.gitignore`** — the contents
are regenerated by the upstream-sync script and never committed.

## Why nothing here is committed

Upstream mirrors are bulky (multiple gigabytes of source trees from
projects like Feldera, Arrow, Bond, Bonsai-Rx, BookKeeper, Capnproto,
and dozens of others). Committing them would:

- bloat the repo by orders of magnitude,
- pin Zeta to a specific upstream snapshot (we want to track current
upstream main, not freeze it),
- pollute `git log` with content the project doesn't author,
- force every clone to download upstream history we already have via
the upstream's own remote.

The git-ignored mirror lets contributors work locally against the
upstream tree (read code, run benchmarks, copy patches) while keeping
the repo itself lean.

## How the mirror is regenerated

`references/reference-sources.json` is the canonical list.
[`tools/setup/common/sync-upstreams.sh`](../../tools/setup/common/sync-upstreams.sh)
reads it and clones (or pulls) each entry under
`references/upstreams/<project-name>/`. The sync script is invoked
by `tools/setup/install.sh` and can also be run standalone. See
[`references/README.md`](../README.md) for the broader references
layout.

## Why the sentinel pair

This `.gitignore` plus `README.md` follow the same pattern as `drop/`
(per-user staging area for incoming content) and `roms/` (gitignored
emulator-test corpus): the sentinel preserves the directory in version
control so contributors see it on clone, but the bulky contents stay
local and regeneratable. Without the sentinel, an empty
`references/upstreams/` directory either disappears at clone time or
risks accidental commits of upstream source.

Pattern documented at:

- `drop/.gitignore` + `drop/README.md` (Otto-staging-zone)
- `roms/.gitignore` + `roms/README.md` (Otto safe-ROM testbed)
- this directory (Otto upstream-source mirror)

## What does NOT live here

- **Vendored upstream snapshots** that ARE committed (because the
project depends on them at a pinned version) live elsewhere — see
`references/tla-book/` for an example. Those are intentionally
tracked.
- **Notes about upstream code** live under `references/notes/`, not
here. Notes are factory-authored prose; this directory is upstream-
authored source.
- **Zeta's own artifacts** never land here. This is read-only mirror
territory.

## How to add a new upstream

1. Add an entry to `references/reference-sources.json` (license,
canonical URL, intended use).
2. Run the sync script — your new upstream lands at
`references/upstreams/<project-name>/`, gitignored automatically by
the `*` rule above.
3. Land the JSON change as a normal PR. The mirror clone happens on
each contributor's machine on first sync.

## Why this README is committed

Without committed prose explaining the directory's purpose, a
new contributor seeing an empty `references/upstreams/` (after a fresh
clone, before running the sync script) would have no signal that this
is a real working directory. The README is the signal.
Loading