Skip to content

test(engine): make TestDrainTimeout deterministic + preserve subclass type in @ontology_entity#1729

Merged
Aureliolo merged 2 commits into
mainfrom
fix/test-drain-timeout-flaky
May 3, 2026
Merged

test(engine): make TestDrainTimeout deterministic + preserve subclass type in @ontology_entity#1729
Aureliolo merged 2 commits into
mainfrom
fix/test-drain-timeout-flaky

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Fix the timing-sensitive TestDrainTimeout::test_drain_timeout_resolves_pending_futures flake hit by PR #1727's Test (Python 3.14) job, plus a foundational fix to the @ontology_entity decorator that surfaced as Pyright false-positives in the same file.

Root cause (test flake)

PR #1545 (commit 13f3db679, "TOCTOU races + resource leaks") changed TaskEngine.stop() to install an outer hard-deadline at 2 × timeout and re-raise TimeoutError when cleanup exceeds it. The pre-existing test called await eng.stop(timeout=0.05) and expected stop() to return cleanly, leaving only ~50 ms for cancellation + future-cleanup. Under xdist load on slow Ubuntu CI runners, cleanup intermittently overruns 50 ms, the outer hard-deadline fires, and TimeoutError escapes to the test as 1 failed, 27881 passed.

Changes

tests/unit/engine/test_task_engine_integration.py

  • Bump stop(timeout=…) from 0.05 to 0.5. Same contract is exercised (inner drain still fires, queued futures still resolve to shutdown failure) without race sensitivity. Cleanup margin grows from 50 ms to 500 ms — comfortably above the few milliseconds the cleanup actually needs.
  • Inline comment ties the budget to the 2 × timeout hard-deadline ratio so the next reader doesn't lower it back.

src/synthorg/ontology/decorator.py

  • The decorator's positional overload returned type[BaseModel], erasing concrete subclass identity. Pyright reported false-positive "no attribute" diagnostics on every @ontology_entity-decorated class (Task, Artifact, Agent, Project, Role, MeetingRecord, OrgFact, etc.). Mypy was lenient because the alternative overload returned Any.
  • Switched to PEP 695 inline type parameters (def ontology_entity[T: BaseModel](cls: type[T], /) -> type[T]: ...), bound to BaseModel. Static analysers now resolve Task as type[Task] instead of type[BaseModel].

Test plan

  • Targeted test 5 consecutive runs, including one slow run (31 s) that mirrors the CI-runner pressure that flaked the original — 5/5 PASSED.
  • Full file tests/unit/engine/test_task_engine_integration.py — 11/11 PASSED in 12 s.
  • Full strict mypy: Success: no issues found in 3626 source files.
  • Ruff lint + format clean.
  • Pre-push gates green: mypy (affected modules), pytest unit (affected modules), forbidden-literal gate, persistence-boundary gate, typed-boundary contract, cost-recording chokepoint.

Review coverage

This PR was opened directly via the GitHub MCP tool because a parallel /pre-pr-review is running in another terminal on a different branch. The full pre-PR review pipeline (specialist agents, triage gate) was not invoked. The changes are narrow and verified by the gates listed above.

Run /aurelio-review-pr after external reviewers (CodeRabbit, etc.) provide feedback.

Aureliolo added 2 commits May 3, 2026 14:37
The decorator's positional overload returned type[BaseModel], which
erased the concrete subclass type. Pyright reported false-positive
'no attribute' diagnostics on every @ontology_entity-decorated class
(Task, Artifact, Agent, Project, Role, MeetingRecord, OrgFact, etc.).
Mypy was lenient because the alternative overload returned Any.

Use PEP 695 inline type parameters bound to BaseModel so the
decorated class identity flows through to consumers. Static analysers
now resolve Task as type[Task] instead of type[BaseModel].
stop(timeout=T) installs an outer hard-deadline at 2*T and re-raises
TimeoutError when cleanup exceeds it. The original 50 ms budget left
only ~50 ms for cancellation plus future-cleanup, which raced under
xdist load on slow CI runners and surfaced as TimeoutError escaping
stop() (e.g. PR #1727 hit this).

Bump the budget to 500 ms so the cleanup margin is 500 ms instead
of 50 ms. The same contract is exercised (inner drain still fires,
queued futures still resolve to shutdown failure) without race
sensitivity. Inline comment explains the budget choice tied to the
hard-deadline ratio so the next reader does not lower it back.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: df72a6b1-bd8c-421f-9e06-7c6b34522972

📥 Commits

Reviewing files that changed from the base of the PR and between 5cb1212 and 2aca184.

📒 Files selected for processing (2)
  • src/synthorg/ontology/decorator.py
  • tests/unit/engine/test_task_engine_integration.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Type Check
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python source code must not use from __future__ import annotations; Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:)

Files:

  • src/synthorg/ontology/decorator.py
  • tests/unit/engine/test_task_engine_integration.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

{src,tests}/**/*.py: All public functions in Python must have type hints; mypy strict mode enforced
Docstrings in Python must use Google style and are required on public classes and functions (ruff D rules enforced)
Create new objects instead of mutating existing ones; use Pydantic model_copy(update=...) for runtime state mutations, never direct attribute assignment
Separate config vs runtime state: use frozen models for config/identity; use separate mutable-via-copy models for runtime state that evolves
Never mix static config and mutable runtime fields in one Pydantic model
Classes that read time or sleep take clock: Clock | None = None defaulting to SystemClock() from synthorg.core.clock; tests inject FakeClock
Python functions must be less than 50 lines
Python files must be less than 800 lines
Line length must not exceed 88 characters (ruff enforced)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like example-provider, example-large-001
Comments must explain WHY only, never origin/review/issue context, reviewer citations, in-code issue/PR back-references, or round/iteration narrative

Files:

  • src/synthorg/ontology/decorator.py
  • tests/unit/engine/test_task_engine_integration.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: All Pydantic models must use ConfigDict(frozen=True, allow_inf_nan=False) by default unless documented otherwise
Use extra="forbid" on request DTOs in Pydantic models
Use @computed_field for derived values in Pydantic v2 models instead of regular fields
Use NotBlankStr from synthorg.core.types for identifier and name fields in Pydantic models
Every A2A RPC method must declare a typed Pydantic args model and validate before dispatch
Every entry-point that ingests a dict payload from an external source (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import) must call parse_typed() from synthorg.api.boundary with a hardcoded LiteralString boundary name
Use asyncio.TaskGroup for fan-out/fan-in async concurrency, wrapping independent task bodies to catch Exception (re-raise only MemoryError/RecursionError) so one failure doesn't unwind the group
Async start() / stop() services must own a dedicated self._lifecycle_lock; timed-out stops mark the service unrestartable
Wrap attacker-controllable strings at LLM call sites via wrap_untrusted() from synthorg.engine.prompt_safety; append untrusted_content_directive(tags) to the enclosing system prompt
Never call lxml.html.fromstring on attacker input; use HTMLParseGuard from synthorg.tools.html_parse_guard
Cross-cutting subsystems follow protocol + strategy + factory + config discriminator pattern with safe defaults
Handle errors explicitly, never swallow exceptions. Domain error families register a base-class entry in EXCEPTION_HANDLERS so subtypes get correct status codes
Domain error classes must use <Domain><Condition>Error naming and inherit from DomainError (or a domain-scoped intermediate that inherits DomainError)
Never use bare Exception / RuntimeError at domain boundaries; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing
Every Pydantic model is `Confi...

Files:

  • src/synthorg/ontology/decorator.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/ontology/decorator.py
{src,web/src}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No default may privilege a single region, currency, or locale; every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback

Files:

  • src/synthorg/ontology/decorator.py
{src,web/src}/**/*.{py,tsx,ts}

📄 CodeRabbit inference engine (CLAUDE.md)

{src,web/src}/**/*.{py,tsx,ts}: Currency: never hardcode ISO 4217 codes; backend use DEFAULT_CURRENCY from synthorg.budget.currency or runtime budget.currency setting; frontend use DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency
Never use _usd suffix on money fields; the type carries money semantics; the value is in the operator's configured currency
Store UTC datetimes; render via Intl without passing timeZone (browser tz wins)
Always format date/number via Intl; no hand-rolled templates
Use metric units only; International / British English UI default (colour, behaviour, organise, centred, analyse, cancelled); document deviations

Files:

  • src/synthorg/ontology/decorator.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Every marker must use one of: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Every Mock() / AsyncMock() / MagicMock() must declare spec=ConcreteClass; pre-commit gate scripts/check_mock_spec.py enforces this
Use mock_dispatcher from tests/conftest.py (an AsyncMock(spec=NotificationDispatcher)) instead of building the spec inline
Import FakeClock from tests._shared.fake_clock (NOT from rollout-subsystem paths) and inject it into the class under test; FakeClock.sleep advances virtual time AND yields once
For tests that need to drive cooperative tasks waiting on the loop, use await clock.advance_async(seconds)
FakeClock-first: when the class under test accepts clock= parameter, always inject FakeClock rather than monkey-patching globals like time.monotonic() / asyncio.sleep()
Patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam
Never use monkeypatch.setattr(module.logger, "info", spy) to spy on logger calls; use a context manager that wraps setattr + try/finally del proxy.<level>
Prefer @pytest.mark.parametrize for testing similar cases
Tests must use test-provider, test-small-001, etc. instead of real vendor names
Python uses Hypothesis for property-based testing; run fuzzing via HYPOTHESIS_PROFILE=fuzz uv run pytest ... --timeout=0
For timing-sensitive tests, use FakeClock-first: if the class accepts clock=, inject FakeClock and drive virtual time via clock.advance(...)
For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/engine/test_task_engine_integration.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/test_task_engine_integration.py
🔇 Additional comments (3)
src/synthorg/ontology/decorator.py (2)

25-26: Type-only import placement is correct.

Callable under TYPE_CHECKING is appropriate here and keeps runtime imports lean while supporting overload typing.


132-151: Generic overloads are a strong typing improvement.

The new type[T]-preserving overloads correctly keep concrete model subclass types across both decorator usage patterns.

tests/unit/engine/test_task_engine_integration.py (1)

320-336: Looks good — the larger stop timeout matches the engine’s 2× hard-deadline behavior.

This keeps the test focused on the same drain/cleanup contract while giving enough headroom to avoid the CI/xdist race without materially slowing the suite.


Walkthrough

This pull request updates the type stubs for the @ontology_entity decorator in src/synthorg/ontology/decorator.py to use generic type parameter T for better static type preservation. The no-argument overload now returns type[T] and the keyword-argument overload returns Callable[[type[T]], type[T]], replacing previous type[BaseModel] and Any return types. The runtime implementation behavior remains unchanged. Additionally, a test timeout in tests/unit/engine/test_task_engine_integration.py is increased from 0.05 to 0.5 seconds with added explanatory comments clarifying the drain and hard-deadline timeout mechanics.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: making the TestDrainTimeout test deterministic and preserving subclass type in @ontology_entity decorator.
Description check ✅ Passed The description provides detailed context and rationale for both changes, thoroughly explaining the root cause of the test flake and the typing fix.
Docstring Coverage ✅ Passed Docstring coverage is 60.00% which is sufficient. The required threshold is 40.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 3/5 reviews remaining, refill in 17 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 12:44 — with GitHub Actions Inactive
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 3, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing fix/test-drain-timeout-flaky (2aca184) with main (5cb1212)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the ontology_entity decorator to use PEP 695 generic syntax, aiming to preserve concrete subclass types in static analysis, and increases a test timeout to improve CI stability. The review identifies critical runtime issues: Callable and BaseModel are imported only during type checking but are required at runtime for signature evaluation, which will cause NameError exceptions. Furthermore, the decorator factory overload is flagged for potential type erasure since the generic parameter T cannot be inferred from the factory's arguments.

Comment on lines 22 to +25
logger = get_logger(__name__)

if TYPE_CHECKING:
from collections.abc import Callable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Callable import is placed inside a TYPE_CHECKING block, but it is used in the runtime signatures of the ontology_entity overloads (e.g., line 151). Since this module does not use from __future__ import annotations, these signatures are evaluated when the module is loaded, which will result in a NameError at runtime. This import should be moved outside the TYPE_CHECKING block.

Suggested change
logger = get_logger(__name__)
if TYPE_CHECKING:
from collections.abc import Callable
from collections.abc import Callable
logger = get_logger(__name__)
if TYPE_CHECKING:

# false-positive "no attribute" diagnostics.
@overload
def ontology_entity(cls: type[BaseModel], /) -> type[BaseModel]: ...
def ontology_entity[T: BaseModel](cls: type[T], /) -> type[T]: ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PEP 695 type parameter T uses BaseModel as its bound. In Python 3.12+, bounds of type parameters are evaluated at runtime when the function is defined. Since BaseModel is only imported within a TYPE_CHECKING block (line 27) and annotations are not deferred via from __future__ import annotations, this will cause a NameError at runtime. Additionally, the use of EntityTier and EntitySource in the subsequent overload's signature will trigger similar errors. To fix this while avoiding circular imports, you may need to use the traditional TypeVar syntax with a string forward reference for the bound: T = TypeVar("T", bound="BaseModel").

Comment on lines +146 to +151
def ontology_entity[T: BaseModel](
*,
entity_name: str | None = None,
tier: EntityTier | None = None,
source: EntitySource | None = None,
) -> Any: ...
) -> Callable[[type[T]], type[T]]: ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The second overload for ontology_entity (the decorator factory) appears to erase the concrete subclass type in static analysis. The type parameter T is scoped to the ontology_entity function and is bound when the factory is called. Because T is not used in the factory's arguments, it cannot be inferred and will default to its bound (BaseModel). Consequently, the returned callable is typed as Callable[[type[BaseModel]], type[BaseModel]], which erases the specific identity of the decorated class. To preserve the subclass type, the returned callable itself should be generic (e.g., by using a Protocol with a generic __call__ or using a module-level TypeVar).

@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.73%. Comparing base (5cb1212) to head (2aca184).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1729   +/-   ##
=======================================
  Coverage   84.72%   84.73%           
=======================================
  Files        1789     1789           
  Lines      102395   102395           
  Branches     8991     8991           
=======================================
+ Hits        86756    86762    +6     
+ Misses      13452    13449    -3     
+ Partials     2187     2184    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Aureliolo Aureliolo merged commit b00fb05 into main May 3, 2026
77 checks passed
@Aureliolo Aureliolo deleted the fix/test-drain-timeout-flaky branch May 3, 2026 12:52
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 12:52 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 3, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Frontend and UX polishing improves user interface responsiveness and
visual consistency.
- API hygiene and validation enhancements provide smoother and more
reliable interactions.

### What's new
- Introduced typed-boundary helpers enabling better type safety and
parse_typed workflows.
- Added codebase-audit skill prompt tuning for improved project
auditing.

### Under the hood
- Eliminated flaky tests caused by module-level state for more stable
test outcomes.
- Unified image tag management under CLI and Renovate for consistent
dependency updates.
- Added cross-PR file-overlap analysis to the review dependency pull
request skill.
- Updated multiple dependencies including Python, Web, CLI, and
container libraries.
- Improved CI tooling and lock file maintenance for better build
reliability.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.8](v0.7.7...v0.7.8)
(2026-05-03)


### Features

* **api:** typed-boundary helper + codebase-audit skill prompt tuning
([#1712](#1712))
([40ee65b](40ee65b))
* **boundary:** RFC
[#1711](#1711) Phases 2 + 3
— typed boundaries via parse_typed
([#1720](#1720))
([7b9f409](7b9f409))


### Bug Fixes

* **api:** audit cleanup B -- API hygiene & validation
([#1719](#1719))
([3d790d9](3d790d9))
* audit cleanup C - persistence, concurrency & data integrity
([#1708](#1708))
([#1717](#1717))
([bcce097](bcce097))
* **test:** exterminate xdist-flaky tests with module-level state
([#1713](#1713))
([#1721](#1721))
([8d258dd](8d258dd))
* **web:** audit cleanup E -- frontend & UX polish
([#1710](#1710))
([#1718](#1718))
([3a3591a](3a3591a))


### Refactoring

* **cli:** single source of truth for DHI image tags + Renovate manager
([#1723](#1723))
([57980a2](57980a2))


### Documentation

* audit cleanup D -- public-facing & docs sync
([#1709](#1709))
([#1715](#1715))
([ade03b7](ade03b7))


### Tests

* **engine:** make TestDrainTimeout deterministic + preserve subclass
type in [@Ontology](https://github.com/ontology)_entity
([#1729](#1729))
([b00fb05](b00fb05))


### CI/CD

* Update CI tool dependencies
([#1703](#1703))
([355a9ff](355a9ff))


### Maintenance

* add cross-PR file-overlap analysis to review-dep-pr skill
([#1722](#1722))
([3861d8a](3861d8a))
* **ci:** unify apko-version under workflow env so Renovate manages it
everywhere ([#1724](#1724))
([9c0a7fd](9c0a7fd))
* consolidate DHI image-pin custom regex managers
([#1726](#1726))
([b8b0cba](b8b0cba))
* **deps:** update dependency chainguard-dev/melange to v0.50.4
([#1701](#1701))
([8cbf83a](8cbf83a))
* Lock file maintenance
([#1705](#1705))
([414cfea](414cfea))
* Lock file maintenance
([#1727](#1727))
([5cb1212](5cb1212))
* Update CLI dependencies
([#1702](#1702))
([9fb57b9](9fb57b9))
* Update Container dependencies
([#1698](#1698))
([6d24fd6](6d24fd6))
* Update dependency @eslint-react/eslint-plugin to v5
([#1704](#1704))
([1cb1294](1cb1294))
* Update Python dependencies
([#1699](#1699))
([8e7af3a](8e7af3a))
* Update Python dependencies to v4.15.0
([#1725](#1725))
([69164c8](69164c8))
* Update Web dependencies
([#1700](#1700))
([715300d](715300d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant