Skip to content

Build initial risk/data management functionality for statistical arbitrage#785

Merged
chrisaddy merged 24 commits intomasterfrom
statistical-arbitrage-phase-one
Mar 11, 2026
Merged

Build initial risk/data management functionality for statistical arbitrage#785
chrisaddy merged 24 commits intomasterfrom
statistical-arbitrage-phase-one

Conversation

@forstmeier
Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier commented Mar 5, 2026

Overview

Changes

Context

I'm separating this into chunks just because the pull request will be pretty big regardless and I want to be able to apply as many of the bot feedback as I can.

These are the data resources and scripts I used to test it - they lived in tools/ and tools/data/ but I don't think they need to be checked into the repository. I liked what they generated though and it showed everything working.

stat_arb_test.py
build_stat_arb_test_data.py
alpaca_shortable_cache.json
stat_arb_test_prices.csv
stat_arb_test_sectors.csv

Summary by CodeRabbit

  • New Features

    • Ensemble prediction blending, statistical‑arbitrage pair selection, volatility‑parity position sizing, shortable‑ticker filtering, historical price & equity ingestion, market beta/regime analytics, and textual portfolio reports.
  • Improvements

    • Stronger portfolio/pair validation, clearer error handling, adjusted portfolio empty-response behavior, refined open/close position flow, and added pair identifiers throughout.
  • Tests

    • Extensive unit tests covering ingestion, consolidation, pair selection, sizing, beta/regime, reporting, and broker interactions.
  • Dependencies

    • Added scipy (>=1.17.1).

Copilot AI review requested due to automatic review settings March 5, 2026 03:31
@github-project-automation github-project-automation Bot moved this to In Progress in Overview Mar 5, 2026
@github-actions github-actions Bot requested a review from chrisaddy March 5, 2026 03:31
@github-actions github-actions Bot added the python Python code updates label Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 5, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Adds a pairs-first statistical-arbitrage pipeline: data fetchers, model consolidation, pair selection, volatility- and beta-based sizing, regime/beta analytics, Alpaca shortability checks, portfolio schema changes (pair_id), server orchestration updates, new exceptions, and extensive unit tests and Rust datamanager adjustments.

Changes

Cohort / File(s) Summary
Dependency
applications/portfoliomanager/pyproject.toml
Added runtime dependency scipy>=1.17.1.
Alpaca client
applications/portfoliomanager/src/portfoliomanager/alpaca_client.py
Added get_shortable_tickers(...) and asset-related imports to query active US equities and filter shortable/easy_to_borrow tickers.
Data fetchers
applications/portfoliomanager/src/portfoliomanager/data_client.py
New HTTP fetch utilities: fetch_historical_prices, fetch_equity_details, fetch_spy_prices; return Polars frames and raise PriceDataUnavailableError on fetch failures.
Consolidation
applications/portfoliomanager/src/portfoliomanager/consolidation.py
New compute_ticker_volatility and consolidate_predictions to blend model quantiles into ensemble alpha/confidence, normalize, and enrich with volatility and sector.
Statistical arbitrage
applications/portfoliomanager/src/portfoliomanager/statistical_arbitrage.py
New module: build price matrix, compute spread z-score/hedge, candidate construction, correlation filtering, greedy non-overlap selection via select_pairs.
Risk management
applications/portfoliomanager/src/portfoliomanager/risk_management.py
Replaced prior portfolio builders with size_pairs_with_volatility_parity and _apply_beta_neutral_weights; added REQUIRED_PAIRS and related constants; uses InsufficientPairsError.
Beta & Regime
applications/portfoliomanager/src/portfoliomanager/beta.py, .../regime.py
New compute_market_betas and compute_portfolio_beta; new classify_regime and RegimeResult typed dict for regime detection.
Server orchestration
applications/portfoliomanager/src/portfoliomanager/server.py
Major flow refactor: async lifespan, fetch external data, get_raw_predictions, consolidate signals, filter shortable tickers, select pairs, compute betas/regime, size pairs, updated function signatures and error handling.
Schema & validation
applications/portfoliomanager/src/portfoliomanager/portfolio_schema.py
Added pair_id field to portfolio schema, pairs_schema, and check_pair_tickers_different validator enforcing long/short differ.
Reporting
applications/portfoliomanager/src/portfoliomanager/report.py
New textual report formatters for regime, betas, consolidation, pairs, and portfolio summaries.
Exceptions
applications/portfoliomanager/src/portfoliomanager/exceptions.py
Removed InsufficientPredictionsError; added PriceDataUnavailableError and InsufficientPairsError.
Tests (Python)
applications/portfoliomanager/tests/*
Added/updated extensive tests for alpaca client, consolidation, data client, portfolio schema, risk sizing, stat-arb, beta, regime, server flows, report formatting, and prior-portfolio handling.
Datamanager (Rust)
applications/datamanager/src/...
Added pair_id: String to Portfolio struct, propagated through storage/query/result mapping and tests; portfolio GET now returns empty JSON array (200) for first-run / empty-file cases.
Datamanager tests & handlers
applications/datamanager/tests/*, applications/datamanager/src/portfolios.rs, applications/datamanager/src/storage.rs
Updated tests and handlers to include pair_id in payloads/fixtures; unified query execution and projection to include pair_id; adjusted empty-file GET behavior.
Tooling config
pyproject.toml
Added Ruff configuration blocks (linting rules).

Sequence Diagram(s)

sequenceDiagram
    participant Server as Server
    participant DM as DataManager API
    participant Models as Prediction Service
    participant Consolidation as Consolidation
    participant Alpaca as Alpaca API
    participant Pairs as Pair Selector
    participant Risk as Risk Manager

    Server->>DM: fetch_historical_prices(reference_date, lookback_days)
    DM-->>Server: historical_prices
    Server->>DM: fetch_equity_details()
    DM-->>Server: equity_details

    Server->>Models: get_raw_predictions()
    Models-->>Server: model_predictions

    Server->>Consolidation: consolidate_predictions(model_predictions, historical_prices, equity_details)
    Consolidation-->>Server: consolidated_signals

    Server->>Alpaca: get_shortable_tickers(consolidated_signals.tickers)
    Alpaca-->>Server: shortable_tickers

    Server->>Pairs: select_pairs(consolidated_signals.filtered, historical_prices)
    Pairs-->>Server: candidate_pairs

    Server->>Risk: size_pairs_with_volatility_parity(candidate_pairs, maximum_capital, timestamp, market_betas, exposure_scale)
    Risk-->>Server: sized_positions

    Server->>Alpaca: open/close positions (orders)
    Alpaca-->>Server: order confirmations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely summarizes the main change: building initial risk/data management functionality for statistical arbitrage. It directly reflects the primary objective and scope of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch statistical-arbitrage-phase-one

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 5, 2026

Greptile Summary

This PR builds the initial statistical arbitrage pipeline for the portfolio manager, adding pair selection, volatility-parity sizing, beta-neutral optimization, regime classification, and hold/stop-loss logic. Most previously flagged issues (zero-price filter, isinf hedge ratio guard, pairs_schema validation, pair_id in position rows, lookback window increase) have been addressed.

Two issues remain:

  • Logic bug in consolidation.py line 63: predictions_df.sort("timestamp").group_by("ticker").last() does not guarantee the most-recent row per ticker because Polars' group_by is hash-based and does not preserve sort order within groups. This can silently mix stale predictions with current ones. The same hazard was correctly fixed in compute_ticker_volatility using sort_by inside the aggregation — apply the same fix here.

  • Unguarded .row(0) in report.py lines 145–146: Calling .row(0, named=True) on potentially empty filtered DataFrames will raise OutOfBoundsError. Although the production pipeline ensures both legs exist for every pair, this function should guard against empty results before accessing rows.

Confidence Score: 3/5

  • Contains one logic bug in signal consolidation that can silently produce stale alpha signals; a secondary defensive programming gap in the report formatter.
  • The PR implements stat-arb core logic well and passes comprehensive tests. However, the logic bug in consolidate_predictions (line 63) is significant: when a model returns multiple predictions per ticker at different timestamps, group_by().last() may return an arbitrary row instead of the most-recent one, silently corrupting ensemble signal blending. This is a production correctness issue. The report.py gap is lower priority since the function is not yet in the live pipeline, but both warrant fixes before merge.
  • applications/portfoliomanager/src/portfoliomanager/consolidation.py (must fix before merge); applications/portfoliomanager/src/portfoliomanager/report.py (should fix before use).

Comments Outside Diff (2)

  1. applications/portfoliomanager/src/portfoliomanager/consolidation.py, line 63 (link)

    group_by().last() does not guarantee the most-recent row per ticker because Polars' group_by is hash-based and does not preserve sort order within groups. The .sort("timestamp") on line 63 orders the whole frame, but once rows enter the grouper their relative order is undefined.

    This is the same ordering hazard that was correctly fixed in compute_ticker_volatility (lines 14–17), which uses sort_by inside the aggregation expression to guarantee per-group ordering. The same fix should be applied here:

    latest_predictions = (
        predictions_df
        .group_by("ticker")
        .agg(pl.all().sort_by("timestamp").last())
    )
  2. applications/portfoliomanager/src/portfoliomanager/report.py, line 145-146 (link)

    .row(0, named=True) is called on lines 145–146 without checking whether the filtered DataFrames are empty. If no LONG or SHORT row exists for a given pair_id, this will raise polars.exceptions.OutOfBoundsError with a confusing error message.

    While the production pipeline ensures both legs are present for every selected pair, this function would crash during debugging or if called with inconsistent arguments. Adding a guard makes the failure explicit:

    long_rows = pair_rows.filter(pl.col("side") == "LONG")
    short_rows = pair_rows.filter(pl.col("side") == "SHORT")
    if long_rows.is_empty() or short_rows.is_empty():
        lines.append(f"  {pair_id:<{_W_PAIR_ID + 2}} (incomplete pair — missing leg)")
        continue
    long_row = long_rows.row(0, named=True)
    short_row = short_rows.row(0, named=True)

Last reviewed commit: a93674f

Comment thread applications/portfoliomanager/src/portfoliomanager/risk_management.py Outdated
Comment thread applications/portfoliomanager/src/portfoliomanager/statistical_arbitrage.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the portfolio manager from a prediction-based long/short strategy to a statistical arbitrage (pairs trading) approach. It introduces data fetching utilities, a consolidation pipeline for blending model predictions, pair selection via cointegration analysis, and volatility-parity-based position sizing. The old add_predictions_zscore_ranked_columns / create_optimal_portfolio approach is replaced end-to-end.

Changes:

  • Added statistical_arbitrage.py for cointegration-based pair selection, consolidation.py for blending model predictions with volatility and sector data, and data_client.py for fetching historical prices and equity details from the data manager.
  • Rewrote risk_management.py to size pairs using inverse-volatility weighting instead of equal-dollar allocation across ranked predictions, and updated server.py with the new pipeline (data fetch → consolidation → shortability filter → pair selection → volatility-parity sizing).
  • Expanded the test suite with comprehensive tests for all new modules and updated existing tests, added pairs_schema validation to portfolio_schema.py, and introduced new custom exceptions (PriceDataUnavailableError, InsufficientPairsError).

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
statistical_arbitrage.py New module for cointegrated pair selection using correlation filtering, spread z-scores, and greedy pair assignment
consolidation.py New module for blending multi-model predictions with realized volatility and sector data
data_client.py New module for fetching historical prices (parquet) and equity details (CSV) from the data manager
risk_management.py Replaced equal-allocation portfolio construction with inverse-volatility-parity pair sizing
server.py Updated portfolio creation pipeline with new data fetching, consolidation, shortability filtering, pair selection, and sizing steps
portfolio_schema.py Added pairs_schema with check_pair_tickers_different validation and optional pair_id column to portfolio_schema
exceptions.py Added PriceDataUnavailableError and InsufficientPairsError; removed InsufficientPredictionsError
alpaca_client.py Added get_shortable_tickers method for filtering assets by shortability and easy-to-borrow status
pyproject.toml Added scipy>=1.17.1 dependency
test_statistical_arbitrage.py New tests for price matrix building, spread z-score computation, and pair selection edge cases
test_consolidation.py New tests for prediction consolidation, blending, and error handling
test_data_client.py New tests for data fetching with mocked HTTP responses
test_risk_management.py Rewrote tests for the new volatility-parity sizing function
test_alpaca_client.py New tests for get_shortable_tickers and full coverage of AlpacaClient methods
test_portfolio_schema.py Added tests for new schema checks and pairs_schema validation
.flox/env/manifest.lock Trailing newline fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/portfoliomanager/src/portfoliomanager/statistical_arbitrage.py Outdated
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 5, 2026
…lidation

- Add np.any(prices <= 0) guard in _compute_log_returns alongside the
  existing NaN check. np.log(0) produces -inf which would propagate
  silently through np.diff and np.corrcoef before the pair was
  eventually discarded; the explicit guard prevents any inf values from
  entering the correlation matrix.

- Add pairs_schema.validate(candidate_pairs) in server.py after
  select_pairs returns (skipped for empty DataFrames, which are handled
  by the existing InsufficientPairsError path). Aligns with project
  guideline requiring DataFrame schemas to be validated in the
  production pipeline, not only in tests.

- Add test_compute_log_returns_excludes_tickers_with_zero_price to
  verify that a ticker containing a zero close price within the
  correlation window is excluded from pair candidates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread applications/portfoliomanager/pyproject.toml
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 5, 2026
forstmeier and others added 11 commits March 5, 2026 22:31
…dge-case coverage

- Fix stop-loss boundary condition in evaluate_prior_pairs: the inclusive
  upper bound (abs_z <= Z_SCORE_STOP_LOSS) incorrectly held pairs at exactly
  z=4.0 rather than triggering stop-loss. Changed to exclusive (abs_z <
  Z_SCORE_STOP_LOSS) so z=4.0 closes the position as intended.

- Sort pair_ids before iteration in evaluate_prior_pairs to make the
  evaluation order deterministic. Fixes a non-deterministic test that relied
  on unique().to_list() ordering.

- Import _PRIOR_PORTFOLIO_SCHEMA from server.py in test_portfolio_server.py
  instead of duplicating the definition, preventing schema drift between
  production code and tests.

- Add edge-case tests for non-positive prices and NaN z-score paths in
  evaluate_prior_pairs (both previously untested branches).

- Add test for missing pair_id column failing portfolio_schema validation,
  locking in the new required field.

- Remove module-level ValueError guard that crashed pytest collection when
  ALPACA_API_KEY_ID and ALPACA_API_SECRET env vars were absent (root cause
  of CI failure).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of CI failure: server.py instantiates AlpacaClient at module
level. The Alpaca SDK's TradingClient.__init__ calls _validate_credentials,
which raises ValueError when empty strings are supplied (as in CI where no
credentials are set). This crashed pytest collection on import.

Fix: Add conftest.py that patches TradingClient and StockHistoricalDataClient
before server.py is imported during test collection, allowing AlpacaClient
to initialize without real credentials. The patch targets the underlying SDK
classes rather than AlpacaClient itself, so test_alpaca_client.py (which
tests AlpacaClient directly with its own per-test patches) is unaffected.

Also fix test_evaluate_prior_pairs_skips_pair_with_non_positive_prices: the
zero-price row was placed at i=0 but pair_price_matrix.tail(60) excludes
the first 5 rows of a 65-row dataset, so the non-positive price check was
never reached. The test fell through to compute_spread_zscore on nearly
identical data, producing scipy/numpy warnings. Moving the zero to the last
row (i=64) ensures it survives the tail cut and the guard triggers correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The module-level ValueError guard was removed in a prior commit to fix CI
test collection failures, but that silently broke the fail-fast guarantee:
a deployment with missing Alpaca credentials would start up cleanly, pass
health checks, and only fail when a trade was actually attempted.

Fix: move AlpacaClient instantiation and credential validation into a
FastAPI lifespan context manager (_lifespan). The lifespan runs at server
startup before any requests are accepted, so:
- A misconfigured deployment still fails immediately at boot with a clear
  error message, before any capital is at risk.
- Tests can import server.py without real credentials because the lifespan
  is not invoked during pytest collection or unit test execution.

The AlpacaClient instance is stored in app.state and retrieved via a local
alias at the top of create_portfolio, following FastAPI's recommended
pattern for app-level state instead of module-level globals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Enforce _PRIOR_PORTFOLIO_SCHEMA when constructing prior portfolio
  DataFrame from JSON response, keeping both branches consistent
- Add boundary test confirming abs_z == Z_SCORE_STOP_LOSS (4.0)
  triggers stop-loss under the exclusive upper bound condition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adding schema= to pl.DataFrame can raise pl.exceptions.SchemaError or
pl.exceptions.InvalidOperationError on type coercion failures, neither
of which inherits from ValueError. Broaden the except clause to include
pl.exceptions.PolarsError so type mismatches fall back to an empty
DataFrame rather than propagating as a 500 from create_portfolio.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without aggregate_function, Polars behavior on duplicate (ticker, timestamp)
pairs is version-dependent: some versions silently take the first value,
others raise a DuplicateError. Specifying "last" makes the deduplication
strategy explicit and version-stable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With only 2 data points, OLS regression yields a trivially perfect fit
(zero residual variance), producing z-scores that pass the NaN guard but
lack statistical reliability. 30 rows is the minimum for stable OLS
residual standard deviation estimates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…te_portfolio_beta

Root cause: The early return in classify_regime checked len(spy_close) <
_MINIMUM_RETURN_COUNT (i.e., < 2), so execution continued with exactly 2
prices producing 1 return. np.std([x], ddof=1) on a single-element array
returns NaN, which silently propagated through volatility comparisons. The
function returned the correct fallback by coincidence (NaN comparisons always
evaluate False in Python), but relied on implicit NaN semantics rather than
an explicit guard.

Fix: Move returns = np.diff(np.log(spy_close)) before the early return and
check len(returns) < _MINIMUM_RETURN_COUNT instead, ensuring at least 2
returns are present before computing standard deviation.

Secondary fix: Remove the now-redundant inner length check on returns before
calling np.corrcoef. After the outer guard, len(returns) >= 2 is guaranteed,
making the else: autocorrelation = 0.0 branch dead code.

Also added a regression test for the exactly-2-prices edge case and an inline
comment on compute_portfolio_beta noting its role in test validation and
intent for future portfolio-level beta reporting.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@chrisaddy chrisaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments on initial statistical arbitrage build.

Comment thread applications/portfoliomanager/src/portfoliomanager/alpaca_client.py
Comment thread applications/portfoliomanager/src/portfoliomanager/consolidation.py
Comment thread applications/portfoliomanager/src/portfoliomanager/consolidation.py
Comment thread applications/portfoliomanager/src/portfoliomanager/server.py
Comment thread applications/portfoliomanager/src/portfoliomanager/server.py
Comment thread applications/portfoliomanager/tests/test_statistical_arbitrage.py
chrisaddy
chrisaddy previously approved these changes Mar 8, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 8, 2026
…peline

Three issues flagged by automated review:

1. consolidation.py: Clip IQR to zero before computing raw_confidence to
   prevent inverted quantiles (quantile_10 > quantile_90) from producing
   raw_confidence > 1.0 or negative/inf values that corrupt the blended mean.
   Added test covering the inverted-quantile edge case.

2. statistical_arbitrage.py: Add np.isinf(hedge_ratio) to the degenerate-pair
   guard. np.polyfit returns inf when log_prices_b has zero variance; the
   explicit check makes the defensive intent clear and prevents inf hedge ratios
   from propagating. Added test with a flat price series to exercise this path.

3. data_client.py: Increase default lookback_days from 90 to 120 calendar days
   to widen the buffer above CORRELATION_WINDOW_DAYS = 60. The previous default
   left only ~3-4 trading days of headroom, meaning data gaps could silently
   drop tickers below the minimum required for pair selection.

Co-Authored-By: Claude <noreply@anthropic.com>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 8, 2026
Comment thread applications/portfoliomanager/src/portfoliomanager/consolidation.py
…ering

Two correctness fixes identified in code review:

1. risk_management.py: Added pair_id to both long_positions and short_positions
   selects. Without this, the saved portfolio had no way to correlate a long
   position with its corresponding short position, breaking pair-level exit logic
   where both legs must be closed together when a z-score reverts.

2. consolidation.py: Added sort_by("timestamp") inside the group_by aggregation
   expression before tail(). Polars group_by does not guarantee intra-group row
   order, so the prior code relied on undefined behavior to pick the most-recent
   VOLATILITY_WINDOW_DAYS returns. The explicit sort_by makes ordering guaranteed.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/portfoliomanager/src/portfoliomanager/consolidation.py Outdated
…ence logging

Three bugs/gaps addressed from code review:

1. regime.py: Fix silent NaN propagation in classify_regime. The guard
   `len(returns) < _MINIMUM_RETURN_COUNT` (min=2) allowed len==2 through,
   causing np.corrcoef to receive single-element arrays and return NaN.
   NaN comparisons silently evaluate to False/0.0, dropping the autocorrelation
   signal. Fix: tighten guard to `< _MINIMUM_RETURN_COUNT + 1` (requires 3+
   returns). _MINIMUM_RETURN_COUNT stays at 2 since beta.py uses it correctly
   for linregress. Added test for the len==2 boundary case. Also added a comment
   documenting the intentional conservative default (trending/0.0 halves exposure).

2. risk_management.py: Add zero-sum guard before normalizing adjusted_weights.
   If the optimizer returns all-zero weights, dividing by sum() would produce NaN.
   Falls back to volatility_parity_weights in that case.

3. server.py: Log regime_confidence alongside regime_state to surface the computed
   value. Added comment noting binary exposure_scale is intentional for this initial
   implementation with confidence reserved for future graduated scaling.

Co-Authored-By: Claude <noreply@anthropic.com>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Mar 8, 2026
forstmeier and others added 6 commits March 8, 2026 20:26
…y, dead code

Three issues flagged by greptile-apps after the previous commit:

1. beta.py line 20: Guard compared len(spy_close) (price count) against
   _MINIMUM_RETURN_COUNT (a returns count), identical off-by-one to the
   one already fixed in regime.py. Changed to _MINIMUM_RETURN_COUNT + 1
   so the guard correctly requires at least two prices to produce one return.

2. risk_management.py line 107: Replaced exact float equality
   adjusted_weights.sum() == 0.0 with np.isclose() for robustness.

3. risk_management.py lines 40-41: Removed unreachable if total == 0.0
   guard inside the SLSQP objective. SLSQP enforces sum(w) = target_total
   throughout optimization and bounds keep all weights strictly positive,
   so total can never be zero.

Co-Authored-By: Claude <noreply@anthropic.com>
…d regime

beta.py had no check for zero/negative SPY close prices before computing
np.log(spy_close), while every ticker already had an explicit np.any(<=0)
guard. A corrupt row with close_price==0 or negative would silently produce
-inf or nan in spy_returns, propagating nan betas for all tickers and
causing SLSQP to fall back to vol-parity weights with a misleading warning.

regime.py had the same asymmetry: np.diff(np.log(spy_close)) was called
without any non-positive check, risking silent nan propagation into the
realized_volatility and autocorrelation calculations.

Fix: extend the existing length guard in beta.py to also reject non-positive
SPY prices; add an equivalent early-return guard in regime.py before the log
call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…three

Build initial beta/regime risk management features
Copilot AI review requested due to automatic review settings March 10, 2026 02:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 31 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

applications/datamanager/src/data.rs:151

  • The pair_id column is not normalized to uppercase, unlike ticker, side, and action which all have .str().to_uppercase() applied (lines 148-150). The test sample_portfolio_lowercase() uses pair_id: "aapl-googl" but test_create_portfolio_dataframe_uppercase_normalization doesn't verify pair_id is uppercased. Since pair_id is derived from tickers (e.g. "AAPL-MSFT"), this could cause mismatches when querying by pair_id if lowercase data is ever written. Consider adding .with_columns([col("pair_id").str().to_uppercase().alias("pair_id")]) alongside the other normalization steps.
    debug!("Normalizing ticker, side, and action columns to uppercase");
    let portfolio_dataframe = portfolio_dataframe
        .lazy()
        .with_columns([col("ticker").str().to_uppercase().alias("ticker")])
        .with_columns([col("side").str().to_uppercase().alias("side")])
        .with_columns([col("action").str().to_uppercase().alias("action")])
        .collect()?;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread applications/portfoliomanager/src/portfoliomanager/beta.py
@chrisaddy chrisaddy merged commit eef2f58 into master Mar 11, 2026
12 checks passed
@chrisaddy chrisaddy deleted the statistical-arbitrage-phase-one branch March 11, 2026 02:15
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Overview Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Python code updates rust Rust code updates

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants