Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
id: B-0030
priority: P3
status: open
title: Extract `tools/hygiene/lint-md-with-exclusions.ts` (TypeScript) — markdownlint-with-repo-aware-exclusions tool; Otto-346 violation #4 this session, this one with real cost (~60s instead of ~3s)
tier: hygiene-tooling
effort: S
ask: Aaron 2026-04-26 — *"this is like the python smell but with python and this one had a real cost it forgot to ignore upstram so it took like a minute to run instead of a few seconds, if it was cononalized in code like in ../scratch it would never forget to exclude directoris like our references"*. The bash pipeline `markdownlint-cli2 "**/*.md" | grep -E 'MD[0-9]{3}'` I composed inline lacked proper repo-aware exclusions for vendored / mirrored directories, ran ~60s instead of expected ~3s. Same Otto-346 pattern (recurring inline composition = missing substrate primitive). Per B-0015 P2 priority bump: target is TypeScript via Bun, not bash and not Python.
created: 2026-04-26
last_updated: 2026-04-26
composes_with: [feedback_otto_346_dependency_symbiosis_is_human_anchoring_via_upstream_contribution_good_citizenship_dont_blaze_past_2026_04_26.md, B-0015, B-0027, B-0028, B-0031, tools/hygiene/fix-markdown-md032-md026.py]
Comment thread
AceHack marked this conversation as resolved.
Comment thread
AceHack marked this conversation as resolved.
tags: [otto-346, recurring-pattern, missing-primitive, tooling-extraction, markdownlint, repo-aware-exclusions, real-cost, typescript, ts-migration]
---

# B-0030 — extract markdownlint-with-repo-aware-exclusions tool

## Origin — Aaron 2026-04-26 catch with cost-evidence

Aaron 2026-04-26 caught the pattern AND named the cost:

> *"this is like the python smell but with python and this one had a real cost it forgot to ignore upstram so it took like a minute to run instead of a few seconds, if it was cononalized in code like in ../scratch it would never forget to exclude directoris like our references (not upstream that's proabalby a bad name i randomly chose, we should rectify to avoid wars/confusion becasue im using upstream incorrectly)"*

This is **Otto-346 violation #4** this session:

1. PR #541 — sort-tick-history-canonical.py (Python tool extracted)
2. PR #542 — fix-markdown-md032-md026.py (Python tool extracted)
3. B-0028 — gh-pr-state-summary tool (TS target; awaiting first-migration unblock)
4. **B-0030 (this row)** — lint-with-exclusions tool (TS target)

The differentiating factor: **this one had measurable cost**. Slow run (~60s) when properly-bounded would be ~3s. That's a 20x cost penalty per invocation, multiplied by every time I run the inline pipeline.

## What the tool would do

**Problem class**: ad-hoc invocations of markdownlint (or other lint tools) on `**/*.md` patterns lack ergonomic defaults for repo-specific exclusions. Each inline use forgets:

- `references/` directory (vendored / mirrored upstream code we don't own)
Comment thread
AceHack marked this conversation as resolved.
- `tools/lean4/.lake/packages/` (Lean dependencies)
- Other generated / vendored / archive directories

**Tool behavior** (proposed):

- `tools/hygiene/lint-md-with-exclusions.ts [paths...]` — wrap markdownlint-cli2 with repo-aware default exclusions
- `--strict` — fail on any violation (default)
- `--summary` — print only error counts per file, not full output
- `--target <pattern>` — override default scope to specific paths
- TypeScript via Bun; reads exclusion config from `.markdownlint-cli2.jsonc` and applies before invocation

**Cost reduction**: from ~60s with missed exclusions → ~3s with canonical exclusions. 20x speedup is real productivity.

## Composition with sibling tools

- `tools/hygiene/fix-markdown-md032-md026.py` (PR #542) — sibling: applies fix; this one detects with proper exclusions
- B-0028 (`gh-pr-state-summary.ts`) — sibling extraction from same Otto-346 pattern; both target TS
- `tools/hygiene/check-tick-history-order.sh` + `check-no-conflict-markers.sh` — sibling architectural shape (shell now; eventual TS rewrite per B-0015)

The cumulative `tools/hygiene/` post-install batch awaiting TS migration:

- B-0027 (markdown-table-cell-count fix tool — owed-build, TS target)
- B-0028 (gh-pr-state-summary — owed-build, TS target)
- B-0030 (this row — lint-with-exclusions — owed-build, TS target)
- + eventual rewrites of #541, #542

## Why TypeScript

Per Aaron's prior priority bump on B-0015 (P3 → P2):

> *"we need to move the typescript migration of our scripts to higher priority so you will stop trying to write python and shell code lol"*

POST-install scripts target TypeScript via Bun. This is post-install (developer + CI machines have Bun).

## Effort sizing

- **Build the tool**: S (under a day). Wrap `markdownlint-cli2` with config-aware exclusion defaults.
- **Read existing `.markdownlint-cli2.jsonc`** for current exclusion patterns; compose with directory-aware logic.
- **Verify cost reduction**: measure before/after run-time on full repo.

## Composes with

- **B-0015** (TS-migration P2 priority — first-migration unblock applies)
- **B-0027** (markdown-table-cell-count tool — sibling extraction)
- **B-0028** (gh-pr-state-summary — sibling extraction)
- **B-0031** (references/ directory rename — paired concern from same Aaron observation)
- **Otto-346** (recurring-pattern absorption; this is the FOURTH instance this session)
- **Otto-341** (mechanism over discipline; tools absorb the pattern)
- **`tools/hygiene/fix-markdown-md032-md026.py`** (PR #542) — sibling Python tool

## Meta-observation captured for substrate

**Otto-346 violation #4 this session — the cumulative count IS the signal**:

| # | Pattern | Outcome |
|---|---|---|
| 1 | Inline Python sort | PR #541 (Python interim) |
| 2 | Inline Python markdown-fix | PR #542 (Python interim) |
| 3 | Inline Python gh-JSON-parse | B-0028 (TS owed) |
| 4 | **Bash markdownlint+grep** | **B-0030 (TS owed; this row)** |
Comment thread
AceHack marked this conversation as resolved.

Four instances in one session is enough signal to *actually start the first sibling-migration*, not just queue more. The discipline is collapsing under repeated catches; the structural answer is the TS-tool that ships first and unblocks the rest.

## What this DOES NOT do

- Does NOT replace `markdownlint-cli2` — wraps it with repo-aware defaults
- Does NOT auto-run on every commit — invoked explicitly when needed
- Does NOT promise complete coverage of every lint scenario — only the recurring-use-case patterns

## Owed work cluster after this row

The post-install TS-migration batch:

- B-0015 batch-resolve-pr-threads.sh → TS (P2)
- B-0027 markdown-table-cell-count tool → TS (P3)
- B-0028 gh-pr-state-summary tool → TS (P3)
- B-0030 lint-with-exclusions tool → TS (P3, this row)
- Rewrites of #541, #542

Five-tool cluster. **First-migration unblock should happen now, not later** — the Otto-346 violations are accumulating proof that the queue is no longer queue but blocker.
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
id: B-0031
priority: P3
status: open
title: Rename `references/` directory — Aaron 2026-04-26 noted "upstream" naming was randomly chosen and collides with git-semantic meaning; rectify before language-wars/confusion compound
Comment thread
AceHack marked this conversation as resolved.
tier: hygiene-naming
effort: M
ask: Aaron 2026-04-26 — *"references (not upstream that's proabalby a bad name i randomly chose, we should rectify to avoid wars/confusion becasue im using upstream incorrectly)"*. The `references/` directory holds vendored / mirrored upstream-codebase content; Aaron initially used the word "upstream" colloquially to refer to it, but "upstream" has specific git-semantic meaning (the parent branch / repo a fork tracks). Using "upstream" in this colloquial sense creates confusion and risks future agents/contributors interpreting it via the git-semantic. "References" is fine for the directory name; the issue is the colloquial vocabulary used around it.
Comment thread
AceHack marked this conversation as resolved.
created: 2026-04-26
last_updated: 2026-04-26
composes_with: [feedback_otto_346_dependency_symbiosis_is_human_anchoring_via_upstream_contribution_good_citizenship_dont_blaze_past_2026_04_26.md, B-0030, docs/GLOSSARY.md, B-0010]
Comment thread
AceHack marked this conversation as resolved.
tags: [naming-clarity, glossary, git-semantic-collision, vocabulary-discipline, otto-339-anywhere, references-directory]
---

# B-0031 — rectify "upstream" colloquial vs git-semantic naming around references/

## Origin — Aaron 2026-04-26

> *"references (not upstream that's proabalby a bad name i randomly chose, we should rectify to avoid wars/confusion becasue im using upstream incorrectly)"*

Aaron self-corrected: he'd been using "upstream" colloquially to refer to the `references/` directory's contents (vendored / mirrored external code). But "upstream" in git semantics specifically means *the parent branch / repo a fork tracks*. Two different meanings; same word; recipe for confusion.

## The naming problem

Two distinct concepts conflated in current vocabulary:

| Concept | What it is | Current name(s) used |
|---|---|---|
| Vendored external code | Code from other projects mirrored into `references/` for inspection / lineage / Otto-346 contribution-tracking | "upstream", "references" (mixed) |
| Git fork-parent | The repo / branch the fork tracks | "upstream" (correctly) |
Comment thread
AceHack marked this conversation as resolved.

The first usage ("upstream" for vendored mirrors) is **colloquial-but-incorrect**; the second ("upstream" for git fork-parent) is **git-semantic-correct**. They collide.

## What this row addresses

1. **Audit current usage** of "upstream" across `docs/`, `memory/`, `tools/`, and code comments — distinguish git-correct uses from colloquial uses
2. **Define replacement vocabulary** for the colloquial sense:
- Candidates: `mirrored-references/`, `vendored-deps/`, `external-source-of-record/`, `inheritance-references/`, just `references/` with explicit definition in glossary
3. **Update `docs/GLOSSARY.md`** to formalize the distinction
4. **Sweep documentation** for misuses; replace colloquial-"upstream" with the chosen term
5. **Code-comment audit** for the same pattern

## Why this matters per Otto-339

Per Otto-339 anywhere-means-anywhere: vocabulary collisions in substrate cause wrong-state-vectors when AI agents (or humans) read the substrate. "Upstream" interpreted via git-semantic when colloquial-sense was meant produces:

- Wrong assumptions about repo relationships
- Confused contribution direction (Otto-346 upstream-contribution gets confused with `references/` write-back which doesn't make sense)
- Documentation drift as later contributors interpret per their own assumed sense

Aaron's catch is preventive-discipline: **rectify before the language-war compounds**. Cheaper to fix at 2026-04-26 than after another 100 substrate references encode the colloquial sense.

## Composes with prior

- **Otto-346** (dependency symbiosis; upstream-contribution discipline) — uses "upstream" in the OSS-contribution sense (canonical repos like bcgit/bc-csharp); the colloquial conflation contaminates the precision Otto-346 requires
- **B-0010** (memory-index-conventions doc) — sibling naming-discipline backlog row
- **`docs/GLOSSARY.md`** — the right home for the formal distinction
- **Otto-339** (anywhere-means-anywhere) — vocabulary precision applies to directory/file/concept naming
- **Otto-286** (definitional precision changes future without war) — Aaron's *exact* phrase here is "to avoid wars/confusion"; this is preventive Otto-286 application
- **B-0030** (lint-with-exclusions tool) — paired concern from the same Aaron message; the lint tool needs to know which directories to exclude AND those directories need clear names

## Programming-language-as-religious-choice connection (Aaron's framing)

Aaron added in the same message:

> *"people literraly say your programming laganguage choice is like a religious choice, and there are programming language wars that resemble religious wars"*

This composes with the naming-discipline at a meta-level: vocabulary collisions create the same religious-war pattern at the substrate-naming layer that programming-language choice creates at the implementation layer. Both are tribal-identity-via-shared-vocabulary patterns. Otto-286 (definitional precision changes future without war) explicitly names "without war" — vocabulary discipline is anti-religious-war discipline.

## Effort sizing

- **Audit**: M (~half-day) — grep for "upstream" across docs/memory/tools; classify each instance
- **Decision on replacement vocabulary**: S (Aaron-decision; agent provides candidates)
- **Sweep**: M (~day) — replace colloquial uses; preserve git-correct uses; update glossary
- **Pre-commit lint candidate**: future tooling could flag colloquial-"upstream" in non-git-context (out of scope for this row)

## What this DOES NOT do

- Does NOT remove "upstream" from git-correct uses — those stay
- Does NOT rename the directory itself necessarily — could just clarify vocabulary around it
- Does NOT mandate immediate execution — research-grade backlog
- Does NOT eliminate all naming-collision concerns — this is one specific instance; sister concerns surface separately

## Operational implications

Going forward (even before sweep lands):

- When using "upstream" in substrate, default to git-semantic (fork-parent) unless explicitly clarified
- For colloquial "vendored external code" sense, prefer "references/" + the eventual replacement vocabulary
- Otto-346's "upstream contribution" means contributions to **canonical OSS project repos** (bcgit/bc-csharp etc.) — that use is git-semantic-aligned and stays

## Cross-references for sweep

When sweep happens, files most likely to need updates:

- `docs/POST-SETUP-SCRIPT-STACK.md` (mentions upstream in OSS-contribution sense — git-aligned, keep)
- `references/` README or similar (defines what's there)
- Otto-NNN substrate referring to "upstream" — most uses are Otto-346-style git-correct contribution sense, but audit each
- BACKLOG rows (B-0007 Bayesian primitives upstream — git-correct, keep)
Loading