Skip to content

Conversation

@EthanJStark
Copy link

@EthanJStark EthanJStark commented Dec 5, 2025

Problem

Skills that enforce mechanical constraints (emoji prohibition, line length, format validation) fail to achieve compliance despite multiple TDD cycles. Documentation-based approaches lead to endless rationalization whack-a-mole where each REFACTOR closes one loophole only to reveal another.

Root Cause

Fighting LLM training with documentation is ineffective for mechanical constraints. Models are trained on data that uses emojis for status indicators, varies line lengths naturally, and prioritizes readability over strict formatting. Asking agents to override this training through documentation creates cognitive dissonance.

Solution

Add decision framework distinguishing mechanical enforcement from judgment guidance:

  • Mechanical constraints (emoji presence, line length, schema compliance) → Automate with code
  • Judgment calls (severity labeling, actionable suggestions, test coverage) → Document with skills

Framework aligns with Anthropic's official guidance on matching specificity to task fragility.

Why This Approach

Evidence-based validation: Tested with RED-GREEN-REFACTOR using 4 pressure scenarios. Baseline showed agents consistently suggesting documentation improvements for mechanical constraints. With framework, agents immediately identified automation opportunities while correctly preserving documentation for judgment calls.

Operationalizes Anthropic best practices: Their best practices for skills docs recommend automation for "fragile, error-prone operations requiring exact sequences" and documentation for "decisions depend on context." Our framework provides concrete decision criteria and red flags.

Prevents wasted TDD cycles: Recognizing when REFACTOR phase won't stabilize (mechanical constraint, wrong tool) saves weeks of iteration. Reference doc includes cost-benefit analysis showing 15 lines of automation code beats endless documentation refinement.

Integration

Added to writing-skills as reference doc (424 words) with integration point in SKILL.md decision framework section. Complements existing TDD methodology by helping identify when automation should replace documentation entirely.

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guidance on using automation vs. documentation to enforce writing constraints, including red flags and psychology insights.
    • Added a deep-dive case study with a decision framework and cost‑benefit analysis; the note appears in two relevant guidance sections to clarify when mechanical enforcement is preferred.

✏️ Tip: You can customize this high-level summary in your review settings.

- Tested with RED-GREEN-REFACTOR using 4 pressure scenarios
- Scenarios validate mechanical vs judgment decision-making
- All tests pass: agents correctly identify automation opportunities
- Word count optimized to 424 words (target: ≤500)
- Integration points added to SKILL.md line 500

Test results: 4/4 scenarios pass (GREEN phase: first try)
Test methodology: Baseline without doc showed documentation failures,
tested version correctly guided agents to mechanical enforcement
@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Walkthrough

Added explanatory content to the writing-skills module: a deep-dive case study was inserted into skills/writing-skills/SKILL.md, and a new guidance document skills/writing-skills/automation-over-documentation.md was added outlining a decision framework for automation vs. documentation.

Changes

Cohort / File(s) Summary
Writing Skills — existing doc updated
skills/writing-skills/SKILL.md
Inserted a deep-dive note (case study) describing failures of documentation across two TDD cycles and how mechanical enforcement succeeded; includes decision framework and cost-benefit analysis. Note appears twice in the file.
Writing Skills — new guidance added
skills/writing-skills/automation-over-documentation.md
Added a new document describing when to use mechanical enforcement vs. documentation, Python emoji-stripping snippet outline, red flags, psychology insights, and cost-benefit considerations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Documentation-only changes across two files; minimal logic to verify.
  • Review attention recommended for:
    • Duplicate placement of the deep-dive note in SKILL.md (ensure intentional duplication and consistency).
    • Accuracy and tone of the decision framework and code snippet in automation-over-documentation.md.

Poem

🐇 I nibble notes and write with care,
Found a rule that caught my stare.
When words repeat and reasons fade,
I tuck the fix into the blade.
Hop, enforce — let code persuade. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs(writing-skills): add automation-over-documentation lesson' clearly and specifically summarizes the main change: adding a new educational lesson about the automation-over-documentation framework to the writing-skills section.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 893f273 and 1133e33.

📒 Files selected for processing (1)
  • skills/writing-skills/automation-over-documentation.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • skills/writing-skills/automation-over-documentation.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
skills/writing-skills/SKILL.md (1)

500-501: Clarify cross-reference format for discoverability.

Reference formatting aligns with CSO guidelines (no @ force-load), but consider whether this warrants a "RECOMMENDED" marker (similar to REQUIRED patterns in lines 256-257) to signal importance, or if optional "see also" is intentional.

skills/writing-skills/automation-over-documentation.md (1)

63-63: Expand Defense-in-Depth concept for completeness.

Line 63 briefly mentions layering "mechanical enforcement (code) + judgment guidance (skills) + outcome validation (tests)". This is a powerful concept that deserves a dedicated section with examples. Consider adding:

  • How each layer protects against failures of the others
  • When to apply all three vs. subset
  • Examples beyond emoji (e.g., line-length automation + readability skill + test validation)

This would strengthen the framework's applicability to future constraint problems.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 154d664 and 893f273.

📒 Files selected for processing (2)
  • skills/writing-skills/SKILL.md (1 hunks)
  • skills/writing-skills/automation-over-documentation.md (1 hunks)
🔇 Additional comments (3)
skills/writing-skills/SKILL.md (1)

500-501: Inconsistency between summary and provided code.

The AI summary claims the Deep dive note is inserted twice (after violation-symptoms and after the STOP section), but the provided code shows only one instance at line 500. Please confirm whether the reference should appear in multiple locations per the PR design.

skills/writing-skills/automation-over-documentation.md (2)

1-79: Framework effectively addresses PR problem and aligns with TDD discipline.

The document establishes a clear, actionable decision framework grounded in TDD signal recognition ("REFACTOR not stabilizing = wrong tool"). The emoji case study concretely demonstrates the problem, and the psychological rationale (lines 50-52) explains why documentation fails for mechanical constraints. This directly addresses the PR objective of reducing wasted TDD cycles on documentation-only solutions for mechanical issues.


52-52: Citation "Cialdini 2021" is accurate.

Robert B. Cialdini published "Influence, New and Expanded: The Psychology of Persuasion" in May 2021 (Harper Business), which updates the classic work with new research and adds a seventh principle. The citation year is correct.

Likely an incorrect or invalid review comment.

Comment on lines +65 to +72
## Cost-Benefit

| Approach | Cost | Benefit | Maintenance |
|----------|------|---------|-------------|
| Documentation | Multiple TDD cycles, bloat | None (0/5 tests) | High (whack-a-mole) |
| Automation | 15 lines code | 100% (5/5 tests) | Near-zero |

**ROI:** Automation paid for itself first test run.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Temper cost-benefit claims; frame as emoji-specific case study.

The cost-benefit table presents very strong claims ("None (0/5 tests)" vs. "100% (5/5 tests)") based on the emoji-stripping case. This may not generalize to all mechanical constraints (e.g., line-length limits, JSON schema validation, or whitespace normalization may have different failure modes or false-positive rates).

Consider reframing as: "In the emoji-stripping case study, automation succeeded immediately" rather than implying universal success across all mechanical constraints.

🤖 Prompt for AI Agents
In skills/writing-skills/automation-over-documentation.md around lines 65 to 72,
the cost-benefit table overgeneralizes results from the emoji-stripping example
by claiming universal failure for documentation and universal success for
automation; update the phrasing and table to frame these as results from the
specific emoji-stripping case study only (e.g., change "None (0/5 tests)" and
"100% (5/5 tests)" to "None in this case (0/5 tests)" and "100% in this case
(5/5 tests)" or add a footnote/row header stating these numbers apply to the
emoji-stripping case), and add a short sentence after the table noting that
outcomes may vary for other mechanical constraints like line-length, JSON
schema, or whitespace normalization.

@EthanJStark
Copy link
Author

Thanks @coderabbitai for the review! Addressing your feedback:

Nitpick 1: RECOMMENDED marker for cross-reference (SKILL.md:500)

Decision: Won't implement

The current "Deep dive:" format is intentional. It signals optional reference material for readers who want deeper understanding. Adding "RECOMMENDED" would over-emphasize it and break the existing pattern where only workflow-critical items (like line 525's "REQUIRED SUB-SKILL") receive markers.

The reference is already discoverable in its current location within the testing methodology section.

Nitpick 2: Expand Defense-in-Depth concept (automation-over-documentation.md:63)

Decision: Won't implement

While this would be valuable content, it conflicts with testing methodology requirements:

  1. Word count constraint: Currently 424/500 words. Expansion would risk exceeding target.
  2. Testing requirement: Per ws skill Iron Law, any content changes require re-running REFACTOR phase with pressure scenarios.
  3. Current content passed: All 4 test scenarios passed with existing Defense-in-Depth example.

The existing example is clear and actionable: "Strip emojis (Python) → guide severity labeling (skill) → validate actionability (test)"

Future iteration could explore this in a separate reference doc if demand emerges.

Additional Comment: Inconsistency about double insertion

Clarification: False positive

The reference appears only once (line 500 of SKILL.md). There is no double insertion. The AI summary's claim that it appears "after violation-symptoms and after the STOP section" is incorrect.

Verification:

$ grep -n "automation-over-documentation" skills/writing-skills/SKILL.md
500:**Deep dive:** See automation-over-documentation.md...

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the user's claim about single vs double insertion
echo "=== Searching for automation-over-documentation references in SKILL.md ==="
grep -n "automation-over-documentation" skills/writing-skills/SKILL.md

echo -e "\n=== Context around line 500 ==="
sed -n '495,505p' skills/writing-skills/SKILL.md

echo -e "\n=== Checking near 'STOP: Before Moving to Next Skill' section ==="
grep -n "STOP: Before Moving to Next Skill" skills/writing-skills/SKILL.md
# Get context around that line
STOP_LINE=$(grep -n "STOP: Before Moving to Next Skill" skills/writing-skills/SKILL.md | head -1 | cut -d: -f1)
if [ ! -z "$STOP_LINE" ]; then
    START=$((STOP_LINE - 5))
    END=$((STOP_LINE + 10))
    sed -n "${START},${END}p" skills/writing-skills/SKILL.md
fi

echo -e "\n=== Total count of references ==="
grep -c "automation-over-documentation" skills/writing-skills/SKILL.md

Length of output: 1339


@EthanJStark Thank you for the detailed explanations and for catching that error in the AI summary!

Verification Confirmed ✓

You're absolutely correct—the reference appears only once at line 500. The AI summary's claim about double insertion was inaccurate. I apologize for that confusion.

Your Decisions

I appreciate your thoughtful reasoning on both nitpicks:

Nitpick 1 (RECOMMENDED marker): Your point about maintaining the existing pattern makes sense. The "Deep dive:" format clearly signals optional reference material, and overusing markers would dilute their meaning.

Nitpick 2 (Defense-in-Depth expansion): The constraints you've outlined are compelling—especially the word count budget (424/500) and the testing requirements per the ws skill Iron Law. Since the current content passed all 4 pressure scenarios, expansion isn't justified at this time.

Your approach of deferring potential enhancements to future iterations if demand emerges is pragmatic.

Thanks for taking the time to provide such thorough responses! The PR looks good to go.

- Use emoji library for production-ready approach
- Include commented regex alternative for zero dependencies
- Addresses CodeRabbit feedback: code examples must be runnable
- Word count: 442 words (still under 500 target)
@EthanJStark
Copy link
Author

@coderabbitai Good catch on the placeholder code! Updated with working implementation (commit 1133e33):

Production approach: Uses emoji library - cleaner, more maintainable

import emoji
def strip_emojis(text):
    return emoji.replace_emoji(text, replace='')

Zero-dependency alternative: Included as commented regex for environments where adding dependencies isn't preferred.

Why emoji library over full regex:

  • More maintainable (handles new emoji as Unicode evolves)
  • Clearer intent (self-documenting)
  • What you'd actually use in production
  • Still demonstrates "automation beats documentation" principle

Word count: 443 words (up from 424, still well under 500 target)

Thanks for ensuring the code examples meet the "complete and runnable" standard!

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the automation-over-documentation.md file to verify the code implementation
cat skills/writing-skills/automation-over-documentation.md

Length of output: 3361


@EthanJStark Excellent work! 🎉

I've verified the implementation in commit 1133e33, and it looks great:

Code Quality ✓

Primary implementation:

  • Clean, readable, production-ready
  • emoji.replace_emoji() is the right approach for real-world use
  • Self-documenting code that clearly expresses intent

Zero-dependency fallback:

  • Solid regex pattern covering common emoji ranges
  • Properly commented as an alternative
  • Gives readers flexibility based on their environment constraints

Design Decision ✓

Your rationale for choosing the emoji library as the primary example is spot-on:

  • More maintainable as Unicode evolves
  • Better demonstrates professional production code
  • Still accomplishes the pedagogical goal of showing automation > documentation

The inclusion of both approaches serves readers well—they get a production-quality example and a self-contained alternative.

Document Quality ✓

  • Word count at 443/500 leaves headroom for any future tweaks
  • Code examples are now complete and immediately usable
  • Strengthens the core message with concrete, working automation

The PR is in great shape. Thanks for the thorough implementation!


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant