Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds SSM-based pinned model-version resolution to server startup, creates the SSM parameter and optional Pulumi import, renames encoder/decoder feature groups to past_* across data/training code, and adds checkpoint save/restore plus validation and tests for SSM resolution. Changes
Sequence Diagram(s)sequenceDiagram
participant Server as Server
participant SSM as AWS SSM
participant S3Resolver as Artifact Resolver (S3)
Server->>SSM: GetParameter Name="/fund/equitypricemodel/model_version"
alt SSM returns pinned non-empty & != "latest"
SSM-->>Server: pinned_version
Server->>S3Resolver: construct key = artifact_path/<pinned_version>/output/model.tar.gz
S3Resolver-->>Server: artifact key / download
else SSM returns empty/"latest" or read fails
SSM-->>Server: empty / error
Server->>S3Resolver: find_latest_artifact_key(artifact_path)
S3Resolver-->>Server: latest artifact key
end
Server->>Server: initialize model from artifact
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR implements three main improvements to the equity price model training pipeline: variable renaming for TiDE model alignment, model checkpoint snapshotting, and SSM-based model version pinning. Key Changes:
Previous Feedback Addressed: Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| applications/equitypricemodel/src/equitypricemodel/server.py | Added SSM parameter-based model version pinning with proper fallback to latest artifact when parameter is not set or errors occur |
| applications/equitypricemodel/src/equitypricemodel/tide_model.py | Renamed encoder/decoder features to past_* naming, added checkpoint saving during training with automatic restoration of best weights, enhanced batch validation |
| applications/equitypricemodel/src/equitypricemodel/tide_data.py | Renamed feature groups to past_* convention, added clarifying comment for calendar features spanning both past and future windows |
| applications/equitypricemodel/src/equitypricemodel/trainer.py | Updated variable names to match past_* convention and added checkpoint_directory parameter to enable model snapshotting |
| applications/equitypricemodel/tests/test_server.py | Added comprehensive test coverage for SSM parameter resolution, fallback behaviors, and path normalization |
| infrastructure/parameters.py | Added SSM parameter for model version pinning with retain_on_delete option and default value of "latest" |
Last reviewed commit: 98a891c
There was a problem hiding this comment.
Pull request overview
This PR does initial cleanup of the equity price TiDE pipeline by aligning feature naming across data/model/trainer, adding model checkpointing during training, and introducing an SSM-driven mechanism to pin or roll forward model artifact versions at runtime.
Changes:
- Renames batch feature keys to
past_*/static_*and updates trainer/model feature combining accordingly. - Adds “best checkpoint” snapshotting during training and restores the best weights after training.
- Adds an SSM parameter (and import support) to optionally pin the model artifact version loaded by the server.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Adds type stubs for SSM to support new SSM usage. |
applications/equitypricemodel/pyproject.toml |
Adds boto3-stubs SSM extras for local typing/dev. |
maskfile.md |
Imports the new SSM parameter into Pulumi state when present. |
infrastructure/parameters.py |
Defines the new /fund/equitypricemodel/model_version SSM parameter. |
applications/equitypricemodel/src/equitypricemodel/server.py |
Loads a pinned model version from SSM (or falls back to latest artifact). |
applications/equitypricemodel/src/equitypricemodel/tide_data.py |
Renames/reshapes batch feature tensors to the new past_* schema. |
applications/equitypricemodel/src/equitypricemodel/tide_model.py |
Enhances batch validation, adds checkpoint save/restore support, updates feature combination. |
applications/equitypricemodel/src/equitypricemodel/trainer.py |
Aligns with renamed features, updates tinygrad import, passes checkpoint directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
applications/equitypricemodel/src/equitypricemodel/tide_model.py (1)
414-449:⚠️ Potential issue | 🟠 MajorCheckpointing is accidentally disabled when early stopping is
None.At Line 414, best-loss tracking and checkpoint save are gated by
early_stopping_patience is not None. So runs with early stopping disabled never save/restore the best checkpoint.🐛 Proposed fix
- if early_stopping_patience is not None: - if epoch_loss < best_loss - early_stopping_min_delta: - best_loss = epoch_loss - epochs_without_improvement = 0 - logger.info( - "New best loss", - best_loss=f"{best_loss:.4f}", - ) - if checkpoint_directory is not None: - Path(checkpoint_directory).mkdir( - parents=True, exist_ok=True - ) - safe_save( - get_state_dict(self), - str( - Path(checkpoint_directory) - / "tide_states.safetensor" - ), - ) - best_checkpoint_saved = True - else: + improved = epoch_loss < best_loss - early_stopping_min_delta + if improved: + best_loss = epoch_loss + epochs_without_improvement = 0 + logger.info( + "New best loss", + best_loss=f"{best_loss:.4f}", + ) + if checkpoint_directory is not None: + Path(checkpoint_directory).mkdir(parents=True, exist_ok=True) + safe_save( + get_state_dict(self), + str(Path(checkpoint_directory) / "tide_states.safetensor"), + ) + best_checkpoint_saved = True + elif early_stopping_patience is not None: epochs_without_improvement += 1 logger.info( "No improvement", epochs_without_improvement=epochs_without_improvement, patience=early_stopping_patience, ) - if epochs_without_improvement >= early_stopping_patience: + if ( + early_stopping_patience is not None + and epochs_without_improvement >= early_stopping_patience + ): logger.info( "Early stopping triggered", epoch=epoch + 1, best_loss=f"{best_loss:.4f}", epochs_without_improvement=epochs_without_improvement, ) breakAlso applies to: 453-457
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@applications/equitypricemodel/src/equitypricemodel/tide_model.py` around lines 414 - 449, The best-loss tracking and checkpoint save are wrongly nested inside the early stopping check (early_stopping_patience is not None), so when patience is None no best checkpoint is saved; refactor the block in tide_model.py so that updating best_loss, resetting epochs_without_improvement, logging "New best loss", creating checkpoint_directory, calling safe_save(get_state_dict(self), ...), and setting best_checkpoint_saved = True happen unconditionally whenever epoch_loss improves, while only the logic that increments epochs_without_improvement, checks epochs_without_improvement against early_stopping_patience, logs "No improvement" and triggers the break remains guarded by early_stopping_patience is not None; apply the same separation for the symmetrical code later that handles final checkpoint/restore so best checkpoint behavior works whether or not early stopping is enabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@applications/equitypricemodel/src/equitypricemodel/server.py`:
- Around line 185-195: The S3 key is built by naive concatenation (artifact_key
= f"{artifact_path}{model_version}/output/model.tar.gz"), which can produce
duplicate or missing slashes if artifact_path lacks/has a trailing slash or
model_version contains leading slashes; update the assignment in the same block
(where model_version and artifact_path are used to build artifact_key) to
normalize parts first (e.g., strip any trailing '/' from artifact_path and
leading '/' from model_version, then join with a single '/' before appending
"/output/model.tar.gz") so the resulting S3 key is always well-formed.
- Around line 188-192: Replace the bare logger.warning call in the SSM read
exception path with logger.exception so the traceback is captured; specifically,
where the except block references logger.warning and
MODEL_VERSION_SSM_PARAMETER, call logger.exception with a short sentence-case
message (e.g., "Ssm parameter read failed") and include
parameter=MODEL_VERSION_SSM_PARAMETER so structlog records the parameter and
stack trace together.
---
Outside diff comments:
In `@applications/equitypricemodel/src/equitypricemodel/tide_model.py`:
- Around line 414-449: The best-loss tracking and checkpoint save are wrongly
nested inside the early stopping check (early_stopping_patience is not None), so
when patience is None no best checkpoint is saved; refactor the block in
tide_model.py so that updating best_loss, resetting epochs_without_improvement,
logging "New best loss", creating checkpoint_directory, calling
safe_save(get_state_dict(self), ...), and setting best_checkpoint_saved = True
happen unconditionally whenever epoch_loss improves, while only the logic that
increments epochs_without_improvement, checks epochs_without_improvement against
early_stopping_patience, logs "No improvement" and triggers the break remains
guarded by early_stopping_patience is not None; apply the same separation for
the symmetrical code later that handles final checkpoint/restore so best
checkpoint behavior works whether or not early stopping is enabled.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
applications/equitypricemodel/pyproject.tomlapplications/equitypricemodel/src/equitypricemodel/server.pyapplications/equitypricemodel/src/equitypricemodel/tide_data.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pyapplications/equitypricemodel/src/equitypricemodel/trainer.pyinfrastructure/parameters.pymaskfile.md
server.py: - Extract _resolve_artifact_key() to make SSM version pinning logic testable - Add "latest" sentinel handling so SSM value "latest" falls back to most recent artifact - Normalize S3 key construction by stripping trailing/leading slashes before joining - Change SSM failure logging from logger.warning to logger.exception to capture stack trace per CLAUDE.md coding guidelines tide_model.py: - Cache tensor-to-numpy conversions in _validate_batch (one .numpy() call per key) to avoid repeated device-to-host transfers across the three validation passes - Decouple checkpoint saving from early stopping: best-loss checkpointing now occurs unconditionally when an epoch improves the loss, regardless of early_stopping_patience; early stopping only controls loop termination infrastructure/parameters.py: - Change SSM parameter default value from "" to "latest" since AWS SSM rejects empty string values (minimum length 1); server treats "latest" as a fallback sentinel tests/test_server.py (new): - Add 10 unit tests covering _resolve_artifact_key: pinned version, latest sentinel, empty string fallback, ParameterNotFound, AccessDeniedException, correct SSM parameter name, and path normalization with/without trailing slashes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@applications/equitypricemodel/src/equitypricemodel/server.py`:
- Around line 171-173: The structured log message passed to logger.info uses a
non-sentence-case key "using_pinned_model_version"; update this to a short
sentence-case message like "Using pinned model version" in the same logger.info
call (located in server.py where model_version is checked) so the call becomes
logger.info("Using pinned model version", version=model_version) while leaving
the structured field name/version argument intact.
In `@applications/equitypricemodel/src/equitypricemodel/tide_model.py`:
- Around line 415-431: The current checkpoint save is tied to the
early_stopping_min_delta comparison, so tiny but real improvements aren't
persisted; change logic so checkpoint saving uses a separate criterion (e.g.,
track best_saved_loss or save whenever epoch_loss < best_saved_loss) instead of
the early_stopping_min_delta-gated best_loss update. Concretely, introduce or
reuse a variable distinct from best_loss (e.g., best_saved_loss), update it when
you call safe_save(get_state_dict(self), ...) and set best_checkpoint_saved =
True when you actually write the file, and keep early_stopping_min_delta only
for the early-stopping counter (epochs_without_improvement) while comparing
epoch_loss to best_loss for stopping decisions.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
applications/equitypricemodel/src/equitypricemodel/server.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pyapplications/equitypricemodel/tests/test_server.pyinfrastructure/parameters.py
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
applications/portfoliomanager/src/portfoliomanager/server.py (1)
196-208:⚠️ Potential issue | 🟡 MinorUse a descriptive exception variable name in this handler.
Please rename
etoerrorin thisexceptblock and update references for guideline compliance and readability.
As per coding guidelines, "Use full word variables in code whenever possible".Proposed fix
- except Exception as e: + except Exception as error: logger.exception( "Failed to close position", ticker=close_position["ticker"], - error=str(e), + error=str(error), ) close_results.append( { "ticker": close_position["ticker"], "action": "close", "status": "failed", - "error": str(e), + "error": str(error), } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@applications/portfoliomanager/src/portfoliomanager/server.py` around lines 196 - 208, Rename the exception variable in the except block from the single-letter `e` to a descriptive `error`: change the handler to `except Exception as error:` and update all references inside the block (the call to `logger.exception("Failed to close position", ticker=close_position["ticker"], error=str(error))` and the `"error": str(error)` entry appended to `close_results`) so `logger.exception`, `close_results`, and `close_position` usages reflect the new `error` identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@applications/equitypricemodel/src/equitypricemodel/server.py`:
- Around line 167-182: The broad except Exception around the SSM read should be
narrowed: replace the generic exception handler with an except ClientError as e
block (ClientError is already imported) to handle expected SSM/API failures and
log the parameter name and the exception via logger.exception, and let any other
unexpected exceptions (e.g., KeyError/TypeError from response parsing) propagate
by not catching them (or re-raising them if you use a catch-all for telemetry);
update the ssm_client get_parameter handling around MODEL_VERSION_SSM_PARAMETER
and model_version parsing accordingly so only ClientError triggers the fallback
behavior.
In `@pyproject.toml`:
- Around line 41-43: Add a root-level Python runtime constraint by adding
requires-python = "==3.12.10" to the top-level pyproject.toml (either as a
standalone key at root or inside a newly added [project] table); this enforces
the exact Python version for the workspace (Ruff's target-version "py312" only
affects linting), so update pyproject.toml to include requires-python =
"==3.12.10" alongside the existing [tool.ruff] section.
---
Outside diff comments:
In `@applications/portfoliomanager/src/portfoliomanager/server.py`:
- Around line 196-208: Rename the exception variable in the except block from
the single-letter `e` to a descriptive `error`: change the handler to `except
Exception as error:` and update all references inside the block (the call to
`logger.exception("Failed to close position", ticker=close_position["ticker"],
error=str(error))` and the `"error": str(error)` entry appended to
`close_results`) so `logger.exception`, `close_results`, and `close_position`
usages reflect the new `error` identifier.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
applications/equitypricemodel/src/equitypricemodel/server.pyapplications/equitypricemodel/src/equitypricemodel/tide_model.pyapplications/portfoliomanager/src/portfoliomanager/server.pypyproject.toml
- Fix misleading log message in _resolve_artifact_key: "Starting data sync with SSM version pinning" replaced with "Resolved artifact key from pinned model version" - Normalize artifact_path once with trailing slash before the SSM try block and use it consistently in both the pinned-version return path and the find_latest_artifact_key prefix argument, preventing S3 prefix mismatches when the env var lacks a trailing slash - Narrow SSM exception handling from broad except Exception to except ClientError, branching on ParameterNotFound (info log, no stack trace) vs other error codes (logger.exception); non-ClientError exceptions now propagate - Add checkpoint_directory entry to train() docstring Args section - Add inline comment above past_categorical slice explaining that calendar features are known for both the lookback and forecast windows - Add [project] table with requires-python = "==3.12.10" to root pyproject.toml to enforce the strict Python version constraint at the workspace level Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This is being closed in favor of #775 which includes several of the fixes here. |
Overview
Changes
Context
Some initial work - the model still needs to be retrained which is what I'll work on next.
Summary by CodeRabbit
New Features
Refactor
Tests
Chores