[harness eval #34649] fix: preserve disabled a11y rules with runOnly#36
[harness eval #34649] fix: preserve disabled a11y rules with runOnly#36valentinpalkovic wants to merge 1 commit into
Conversation
|
Verify HarnessVerdict: Reason: Evidence (vision-check, Vision reasoningRecipe produced no screenshots — cannot verify visible evidence. PR-added unit tests: ❌ failed — vitest exited 1 without writing a JSON report (likely setup error); see Action log Files: vitest output (last 4KB)Replay: Screenshots
|
fe2f521 to
e537022
Compare
Verify HarnessVerdict: Reason: Evidence (vision-check, Vision reasoningThe diff contains multiple types of changes: (1) CI/GitHub workflow simplifications (removing runId-based path logic), (2) test-only additions to a11yRunner.test.ts with new mock setup, and (3) runtime logic changes in a11yRunner.ts for merging disabled rules into RunOptions. The Playwright recipe tests the a11y addon panel rendering on the example-button--primary story and captures the accessibility violations results. While the screenshots show the a11y panel working correctly with 'No accessibility violations found', they cannot definitively confirm that the new mergeDisabledRulesIntoRunOpt PR-added unit tests: ❌ failed — vitest exited 1 without writing a JSON report (likely setup error); see Action log Files: vitest output (last 4KB)Replay: Screenshots
|
Verify HarnessVerdict: Evidence (vision-check, Vision reasoningThe diff primarily modifies test code (a11yRunner.test.ts), implementation logic (a11yRunner.ts, mergeDisabledRulesIntoRunOptions), and CI/build configuration files. The user-visible change—preserving disabled a11y rules when options.runOnly is set—is a behavioral fix in the a11y addon's internal logic that would only be observable if a story had both disabled rules AND runOnly options. The screenshots show the a11y panel rendering successfully with 'No accessibility violations found,' but they don't demonstrate whether the specific rule-merging behavior is working correctly, as this would req PR-added unit tests: ✅ passed — 6714 passed, 0 failed across 2118 suite(s) Files: Replay: Screenshots
|
80ccd7d to
745162d
Compare
Verify HarnessVerdict: Evidence (vision-check, Vision reasoningThe diff's primary user-visible change is in the a11y addon's internal logic (merging disabled rules into runOnly options). The recipe correctly triggers the a11yRunner.run() codepath by opening the Accessibility addon panel, but the screenshots show the panel rendering successfully with accessibility violations/passes tabs visible. Since the change is a silent logic fix (preserving disabled rules during axe.run) with no observable UI difference in the panel's appearance or results display, the screenshots cannot definitively confirm or deny the fix's correctness. PR-added unit tests: ✅ passed — 6714 passed, 0 failed across 2118 suite(s) Files: Replay: Screenshots
|
a11176d to
9de9d5b
Compare
Verify HarnessNo verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See run log for details. |
Verify HarnessNo verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See run log for details. |
Verify HarnessNo verdict produced — the workflow failed before the harness ran (likely recipe-author dispatch, deny-regex, or lint). See run log for details. |
…ule import Wave finding (#36 try-pr-34649 a11yRunner): recipe-author correctly chose @verify-mode: behavioral but reached the changed module via in-browser dynamic import() + monkeypatch — the deny-regex gate rejected it at attempt 1 with no retry, producing "no verdict". - recipe-author-core.ts: on assertNoDeniedPatterns failure, build a retry message (denied pattern + §12.5 pointer) and loop, mirroring the lint failure path. Terminal `deny-regex-hit` only after MAX_RECIPE_ATTEMPTS. - _recipe-authoring-guide.md §12.5: HARD GATE — a behavioral recipe must never import()/monkeypatch/eval the changed module (deny-regex blocks it pre-run = no verdict). Drive the public UI path and assert observable effect; if no UI path exists, fall back to visual smoke + filterPageErrors rather than fabricating a module import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verify HarnessVerdict: Reason: PR-added unit tests: ❌ failed — vitest exited 1 without writing a JSON report (likely setup error); see Action log Files: vitest output (last 4KB)How Playwright validated thistest('a11y addon runs and renders results on a story', async ({ page }, testInfo) => {
const pageErrors: string[] = [];
const consoleErrors: string[] = [];
page.on('pageerror', (err) => {
pageErrors.push(err.stack ?? err.message ?? String(err));
});
page.on('console', (msg) => {
if (msg.type() === 'error') {
consoleErrors.push(msg.text());
}
});
const baseURL =
process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006';
try {
await page.goto(`${baseURL}/?path=/story/example-button--primary`);
const sb = new RecipePage(page, expect);
await sb.waitUntilLoaded();
const errorDisplay = page.locator('#sb-errordisplay');
await expect(errorDisplay).toBeHidden();
const a11yTab = page.getByRole('tab', { name: /accessibility/i });
await expect(a11yTab).toBeVisible({ timeout: 15000 });
await a11yTab.click();
const resultTab = page
.getByRole('tab', { name: /violations|passes|incomplete/i })
.first();
await expect(resultTab).toBeVisible({ timeout: 30000 });
await expect(errorDisplay).toBeHidden();
} finally {
await testInfo.attach('pageErrors', {
body: JSON.stringify(pageErrors),
contentType: 'application/json',
});
await testInfo.attach('consoleErrors', {
body: JSON.stringify(consoleErrors),
contentType: 'application/json',
});
}
expect(filterPageErrors(pageErrors)).toEqual([]);
});Replay: Screenshots
|
… TMPDIR pinned Two distinct wave-#31/#36 root causes, both false regressions: (a) _util.ts previewRoot() filtered `#storybook-root:visible`. Stories with `parameters.layout:'fullscreen'` + the internal-ui side-by-side/stacked theme decorator wrap the story so #storybook-root has a zero-size (Playwright-"not visible") box though it rendered — locator matched nothing, waitForStoryLoaded timed out (#31 manager-sidebar-heading--*). Use `:has(> *)` instead: selects whichever container actually has children, keeps story-vs-docs disambiguation, drops the bounding-box requirement. (b) verify-pr.yml unit-test step runs `env -i … srt … yarn vitest`. `env -i` strips TMPDIR, so Yarn's run-temp realpaths a nonexistent srt path (`lstat '/tmp/claude'` ENOENT) and aborts before vitest starts → false "vitest exited without JSON report" regression (#36 a11yRunner). Pin TMPDIR to an existing allowWrite dir ($PR_HEAD_DIR/.verify-output/vitest-tmp), same rationale REPORT/LOG already live there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verify HarnessVerdict: Reason: PR-added unit tests: ❌ failed — vitest exited 1 without writing a JSON report (likely setup error); see Action log Files: vitest output (last 4KB)How Playwright validated thistest('a11y runner preserves disabled rules with runOnly without runtime errors', async ({ page }, testInfo) => {
const pageErrors: string[] = [];
const consoleErrors: string[] = [];
page.on('pageerror', (err) => {
pageErrors.push(err.stack ?? err.message ?? String(err));
});
page.on('console', (msg) => {
if (msg.type() === 'error') {
consoleErrors.push(msg.text());
}
});
const baseURL =
process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006';
try {
await page.goto(`${baseURL}/?path=/story/example-button--primary`);
const sb = new RecipePage(page, expect);
await sb.waitUntilLoaded();
const errorDisplay = page.locator('#sb-errordisplay');
await expect(errorDisplay).toBeHidden();
const previewRoot = sb.previewRoot();
await expect(previewRoot).toBeVisible();
const previewIframeHandle = await page.waitForSelector('#storybook-preview-iframe');
const previewFrame = await previewIframeHandle.contentFrame();
expect(previewFrame).not.toBeNull();
const result = await previewFrame!.evaluate(async () => {
const channel = (window as any).__STORYBOOK_ADDONS_CHANNEL__;
if (!channel) return { ok: false, reason: 'no channel' };
return await new Promise<{ ok: boolean; reason?: string }>((resolve) => {
const timeout = setTimeout(() => resolve({ ok: false, reason: 'timeout' }), 15000);
const onResult = () => {
clearTimeout(timeout);
channel.off('storybook/a11y/result', onResult);
channel.off('storybook/a11y/error', onError);
resolve({ ok: true });
};
const onError = (err: unknown) => {
clearTimeout(timeout);
channel.off('storybook/a11y/result', onResult);
channel.off('storybook/a11y/error', onError);
resolve({ ok: false, reason: `error: ${String(err)}` });
};
channel.on('storybook/a11y/result', onResult);
channel.onReplay: Screenshots
|
…bans root-visible assert Re-run of #36/#31 showed both prior fixes missed the real cause: - #36: TMPDIR pin had zero effect — Yarn's mktempPromise still ENOENT `/tmp/claude`. Root cause: srt derives its sandbox tmp from CLAUDE_CODE_TMPDIR, NOT TMPDIR. The main recipe run inherits it via $GITHUB_ENV ($SANDBOX_TMPDIR); the unit-test step's `env -i` strips it, so srt falls back to its hardcoded `/tmp/claude` (never created). Pass CLAUDE_CODE_TMPDIR=$VITEST_TMPDIR (existing allowWrite dir) in env -i. - #31: previewRoot `:has(> *)` fix removed the _util.ts:66 timeout, but the recipe-author hand-rolled `expect('#storybook-root').toBeVisible()` which is "hidden" for `Sidebar/Heading` (layout:fullscreen + side-by-side = zero-box root). Brand triage rule now explicitly bans root-visibility asserts and prescribes a child `toBeAttached()` content assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verify HarnessVerdict: Reason: PR-added unit tests: ❌ failed — 6761 passed, 1 failed across 2128 suite(s) Files: vitest output (last 4KB)How Playwright validated thistest('a11y runner executes with runOnly + disabled rules without runtime errors', async ({
page,
}, testInfo) => {
const pageErrors: string[] = [];
const consoleErrors: string[] = [];
page.on('pageerror', (err) => {
pageErrors.push(err.stack ?? err.message ?? String(err));
});
page.on('console', (msg) => {
if (msg.type() === 'error') consoleErrors.push(msg.text());
});
const baseURL =
process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006';
try {
await page.goto(`${baseURL}/?path=/story/core-args--passed-to-story`);
const sb = new RecipePage(page, expect);
await sb.waitUntilLoaded();
await expect(page.locator('#sb-errordisplay')).toBeHidden();
const previewRoot = sb.previewRoot();
await expect(previewRoot).toBeAttached();
const runResult = await page.evaluate(async () => {
const iframe = document.getElementById(
'storybook-preview-iframe'
) as HTMLIFrameElement | null;
if (!iframe?.contentWindow) return { ok: false, reason: 'no-iframe' };
const w = iframe.contentWindow as any;
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (w.__STORYBOOK_ADDONS_CHANNEL__) break;
await new Promise((r) => setTimeout(r, 100));
}
const channel = w.__STORYBOOK_ADDONS_CHANNEL__;
if (!channel) return { ok: false, reason: 'no-channel' };
return await new Promise<{ ok: boolean; reason?: string }>((resolve) => {
const timeout = setTimeout(
() => resolve({ ok: false, reason: 'timeout-waiting-for-result' }),
15000
);
const onResult = () => {
clearTimeout(timeout);
resolve({ ok: true });
};
const onError = (err: any) => {
clearTimeout(timeout);
resolve({ ok: false, reason: `runner-error:${err?.error ?? String(err)}` });
};
channel.once('storybook/a11y/result', onResult);
chaReplay: Screenshots
|
…ndate it Wave #36 (after CLAUDE_CODE_TMPDIR fix let vitest run): Playwright recipe failed on `expect(consoleErrors).toEqual([])` because the srt egress jail denies every non-allowlisted domain, so internal-ui's external probes always log `Failed to load resource: net::ERR_INTERNET_DISCONNECTED` — environmental, not a PR regression. No console-error equivalent of filterPageErrors existed. - _util.ts: add `filterConsoleErrors()` dropping `net::ERR_*` (INTERNET_DISCONNECTED / NAME_NOT_RESOLVED / BLOCKED_BY_CLIENT / CONNECTION_REFUSED / FAILED) + the shared cross-origin sessionStorage SecurityError. Verified: keeps only genuine errors. - _recipe-authoring-guide.md §3: MANDATORY subsection — never assert the raw consoleErrors array; always filterConsoleErrors(consoleErrors), mirroring the filterPageErrors mandate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verify HarnessVerdict: Reason: PR-added unit tests: ❌ failed — 6761 passed, 1 failed across 2128 suite(s) Files: vitest output (last 4KB)How Playwright validated thistest('a11y runner executes via addon panel without runtime errors', async ({ page }, testInfo) => {
const pageErrors: string[] = [];
const consoleErrors: string[] = [];
page.on('pageerror', (err) => {
pageErrors.push(err.stack ?? err.message ?? String(err));
});
page.on('console', (msg) => {
if (msg.type() === 'error') {
consoleErrors.push(msg.text());
}
});
const baseURL =
process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006';
try {
await page.goto(`${baseURL}/?path=/story/example-button--primary`);
const sb = new RecipePage(page, expect);
await sb.waitUntilLoaded();
await expect(page.locator('#sb-errordisplay')).toBeHidden();
const a11yTab = page.getByRole('tab', { name: /accessibility/i });
await expect(a11yTab).toBeVisible({ timeout: 15000 });
await a11yTab.click();
const violationsTab = page.getByRole('tab', { name: /violations/i });
const passesTab = page.getByRole('tab', { name: /passes/i });
await expect(violationsTab.or(passesTab).first()).toBeVisible({ timeout: 30000 });
} finally {
await testInfo.attach('pageErrors', {
body: JSON.stringify(pageErrors),
contentType: 'application/json',
});
await testInfo.attach('consoleErrors', {
body: JSON.stringify(consoleErrors),
contentType: 'application/json',
});
}
expect(filterPageErrors(pageErrors)).toEqual([]);
expect(filterConsoleErrors(consoleErrors)).toEqual([]);
});Replay: Screenshots
|
…8 single-sourcing EPIC-5 (test the verifier — was 3 test files, zero on the security/cost core): 8 new vitest suites, 181 tests, auto-globbed by the `scripts` project (no CI wiring needed; satisfies 5.11): - 5.1 recipe-deny.test.ts — all 19 DENY patterns, exact 1-based line, per-line tripwire (no comment-awareness) pinned; eval-#36 `dynamic import(` pinned (isolated + overlapping). - 5.2 agent-dispatch-cost.test.ts — budget gate boundary (computed, not hardcoded), resolveModelId round-trip, pricing digit-transpose guard. - 5.4 derive-verdict-hmac.test.ts — saboteur suite (forged/tampered/ correct/wrong-secret/non-signed vs signed field) + the deferred Wave-1.1 LOW(b) disjointness pin, made non-vacuous (poisoned-set replica proves the guard has teeth). - 5.6 triage/target-suggest, 5.7 mode/target (30-line window edge), 5.10 agent-prompt-sanitize (ANSI/NUL/fence redaction, cap boundary). EPIC-6: - 6.1 SKILL.md — de-absolutized 4 hardcoded /Users/... paths +1 stale prose note to runtime-resolved $REPO_ROOT (git rev-parse). - 6.8 srt pin — replaced manual-paste srt-version/srt-sha256 in verify-pr.yml with committed scripts/verify/srt.lock.json read fail-closed from the TRUSTED base checkout (values byte-identical: 0.0.51 / 36de…6338); load-bearing sha verification in the composite untouched. Added workflow_dispatch-only _srt-sha-probe.yml. Mandatory separate review pass (security-reviewer + code-reviewer), findings addressed and re-verified: - sec HIGH: probe no longer auto-commits/pushes the supply-chain pin — emit-only (summary + artifact + outputs), contents: read, human lands it via reviewed PR (restores the "reviewed diff" invariant). - sec LOW: strict srt version regex (rejects ./../leading-trailing dot) in both workflows. - code HIGH: derive-verdict-hmac.test.ts typed (Partial<VerifyResult>/ RecipeTest/StepStatus) — 22 tsc errors cleared, assertions unchanged. - code MED/LOW: dropped unused beforeEach import; over-claiming "ordering invariant" tests renamed to honest "disjoint rules resolve independently" (no dual-match input exists in the real globs); added target grammar negative cases. Verified: tsc clean for all 8 test files (scripts/tsconfig.json), 181/181 green, both workflows parse, bash -n OK, srt values unchanged, trust boundary + fail-closed intact (security-reviewed). Scope clean (no yarn.lock/shared-tree drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6bc4399 to
6ce2451
Compare
Verify HarnessVerdict: Reason: PR-added unit tests: ❌ failed — 6942 passed, 1 failed across 2178 suite(s) Files: vitest output (last 4KB)How Playwright validated thistest('a11yRunner merges disabled config.rules into runOnly options', async ({ page }, testInfo) => {
const pageErrors: string[] = [];
const consoleErrors: string[] = [];
page.on('pageerror', (err) => {
pageErrors.push(err.stack ?? err.message ?? String(err));
});
page.on('console', (msg) => {
if (msg.type() === 'error') consoleErrors.push(msg.text());
});
const baseURL =
process.env.STORYBOOK_URL ?? testInfo.project.use.baseURL ?? 'http://localhost:6006';
try {
await page.goto(`${baseURL}/?path=/story/example-button--primary`);
const sb = new RecipePage(page, expect);
await sb.waitUntilLoaded();
const errorDisplay = page.locator('#sb-errordisplay');
await expect(errorDisplay).toBeHidden();
const previewIframe = sb.previewIframe();
const previewBody = previewIframe.locator('body');
await expect(previewBody).toBeVisible();
const result = await previewBody.evaluate(async () => {
const w = window as any;
const start = Date.now();
while (!w.__STORYBOOK_PREVIEW__?.channel && Date.now() - start < 10000) {
await new Promise((r) => setTimeout(r, 50));
}
const channel = w.__STORYBOOK_PREVIEW__?.channel;
if (!channel) {
return { ok: false, reason: 'no-channel' };
}
const resultPromise = new Promise<any>((resolve) => {
const onResult = (payload: any) => {
channel.off('storybook/a11y/result', onResult);
resolve({ ok: true, payload });
};
channel.on('storybook/a11y/result', onResult);
setTimeout(() => resolve({ ok: false, reason: 'timeout' }), 15000);
});
channel.emit('storybook/a11y/manual', 'example-button--primary', {
config: {
rules: [{ id: 'target-size', enabled: false }],
},
options: {
runOnly: ['wcag2a'],
},
manual: true,
});
return resultPromise;
});
expect(result).toMatchObject({ ok: true });
Replay: Screenshots
|














Synthetic fork PR for agentic harness eval against storybookjs#34649.