Conversation
- Change GET to POST for /predictions endpoint to match equitypricemodel - Extract ["data"] from predictions response JSON - Handle missing prior portfolio gracefully for first run Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix tide_data.py to only use last window per ticker for predictions (was creating all sliding windows causing batch explosion) - Fix server.py to predict on ALL batches and concatenate results (was only using last batch, missing most tickers) - Add SSM Parameter Store for uncertainty_threshold configuration - Add IAM policy for ECS tasks to read SSM parameters - Configure Sentry to capture warnings in addition to errors - Add custom InsufficientPredictionsError exception Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Datamanager 400 Error on `/predictions` endpoint Root Cause: The `PredictionQuery` struct in `storage.rs` expected `timestamp: DateTime<Utc>`, but the portfoliomanager sends timestamps as Unix floats (e.g., `1737331200.0`). Serde couldn't deserialize the float into a DateTime. Fix: Changed `applications/datamanager/src/storage.rs`: - `PredictionQuery.timestamp` from `DateTime<Utc>` to `f64` - Added conversion: `DateTime::from_timestamp(prediction_query.timestamp as i64, 0)` 2. Equitypricemodel 500 Error Root Cause: Tinygrad uses SQLite for disk caching compiled kernels. FastAPI/uvicorn runs requests in a thread pool, causing: `sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread.` Fix: Added `DISKCACHE=0` environment variable to: - `infrastructure/__main__.py` (ECS task definition) - `infrastructure/docker-compose.yaml` (local development) This disables tinygrad's disk cache, avoiding the SQLite threading issue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
After fixing bugs, create commits with detailed root cause and fix summaries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: The datamanager's GET /predictions endpoint was returning dataframe.to_string() (Polars text format) instead of JSON. When the portfoliomanager called .json() on this response, it failed with "JSONDecodeError: Expecting value: line 1 column 1 (char 0)". Changes: - datamanager/predictions.rs: Serialize DataFrame to JSON using JsonWriter with proper Content-Type header, matching the pattern used by the /portfolios endpoint. Return empty JSON array [] for empty results instead of text. - portfoliomanager/server.py: Add defensive error handling for empty/invalid predictions responses, consistent with how portfolio responses are already handled. Gracefully return portfolio without performance data when predictions are unavailable. Fixes PYTHON-FASTAPI-1C Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: The datamanager's Prediction struct uses timestamp as i64 while Portfolio uses timestamp as f64. When portfoliomanager joins prior_portfolio with prior_predictions on ["ticker", "timestamp"], Polars throws SchemaError because the timestamp column types don't match (f64 on left, i64 on right). Fix: Cast prior_predictions timestamp column to Float64 after loading from JSON, before passing to add_portfolio_performance_columns(). Fixes PYTHON-FASTAPI-1E Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: The equitypricemodel was using DISKCACHE=0 which is not the correct environment variable for tinygrad. Tinygrad's disk cache uses SQLite which cannot be shared across threads. In FastAPI's async/thread pool environment, this caused "SQLite objects created in a thread can only be used in that same thread" errors. Fix: Change environment variable from DISKCACHE=0 to DISABLE_DISK_CACHE=1 which is the correct tinygrad environment variable to disable SQLite disk caching of compiled kernels. Fixes PYTHON-FASTAPI-19 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis pull request removes a local Claude config file, adds GitHub Actions workflows for Claude integration, and updates environment/manifest files. It introduces Sentry across services, enhances logging and error handling, changes numeric types from integers to floats for financial data, modifies API payload shapes for predictions and portfolios, adds DuckDB/S3-backed query improvements, integrates S3-based model artifact handling and SageMaker training flows (tools and runner updates), expands infrastructure with ECR repositories and S3 buckets, and adds new utility scripts and exceptions. Multiple services (datamanager, portfoliomanager, equitypricemodel) and infrastructure components are affected. Sequence Diagram(s)sequenceDiagram
participant Dev as Developer/Client
participant CI as GitHub Actions
participant S3Data as S3 (Data Bucket)
participant S3Artifacts as S3 (Artifacts Bucket)
participant SageMaker as SageMaker
participant ECS as ECS Cluster
participant ModelSvc as equitypricemodel Service
participant DataSvc as datamanager / portfoliomanager
Dev->>CI: Push code / trigger Claude workflows
CI->>ECS: Deploy images (ECR URIs)
Dev->>S3Data: sync equity bars & categories (tools/sync)
Dev->>SageMaker: start training job (tools/run_training_job)
SageMaker->>S3Data: read training data
SageMaker->>S3Artifacts: write training artifacts (model)
ECS->>ModelSvc: start service instance
ModelSvc->>S3Artifacts: find & download latest artifact
ModelSvc->>ModelSvc: extract and load model into app.state
Client->>DataSvc: call predictions / portfolio endpoints
DataSvc->>S3Data: query parquet data via DuckDB/httpfs
DataSvc->>ModelSvc: request model inference (or ModelSvc handles inference locally)
ModelSvc->>Client: return predictions
Possibly related PRs
✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (46)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🚀 The Update (preview) for forstmeier/pocketsizefund/production (at 3877247) was successful. ✨ Neo ExplanationInitial deployment of a complete microservices-based portfolio management system on AWS, provisioning networking, container orchestration, load balancing, storage, and machine learning infrastructure for three interconnected services.Root Cause AnalysisThis deployment represents the initial provisioning of the entire "pocketsizefund" production infrastructure from scratch. The infrastructure code defines a complete microservices architecture for what appears to be a financial portfolio management system with machine learning capabilities. The system consists of three main services (datamanager, equitypricemodel, and portfoliomanager) running on AWS ECS with supporting infrastructure. Dependency ChainThe infrastructure builds out in layers:
Risk AnalysisRisk Level: LOW - This is a greenfield deployment with no existing resources being modified or replaced. All 68 resources are new creations, so there's no risk of data loss or service disruption. Resource Changes Name Type Operation
+ nat_elastic_ip aws:ec2/eip:Eip create
+ ecs_sg aws:ec2/securityGroup:SecurityGroup create
+ alb_sg aws:ec2/securityGroup:SecurityGroup create
+ nat_route aws:ec2/route:Route create
+ datamanager_rule aws:lb/listenerRule:ListenerRule create
+ sagemaker_cloudwatch_policy aws:iam/rolePolicy:RolePolicy create
+ public_route_table aws:ec2/routeTable:RouteTable create
+ ssm_uncertainty_threshold aws:ssm/parameter:Parameter create
+ task_role_s3_policy aws:iam/rolePolicy:RolePolicy create
+ pocketsizefund-production pulumi:pulumi:Stack create
+ equitypricemodel_repository aws:ecr/repository:Repository create
+ task_role aws:iam/role:Role create
+ task_role_ssm_policy aws:iam/rolePolicy:RolePolicy create
+ igw aws:ec2/internetGateway:InternetGateway create
+ private_route_table aws:ec2/routeTable:RouteTable create
+ data_bucket_versioning aws:s3/bucketVersioning:BucketVersioning create
+ s3_gateway_endpoint aws:ec2/vpcEndpoint:VpcEndpoint create
+ portfoliomanager_repository aws:ecr/repository:Repository create
+ ecs_cluster aws:ecs/cluster:Cluster create
+ datamanager_logs aws:cloudwatch/logGroup:LogGroup create
+ public_internet_route aws:ec2/route:Route create
+ private_subnet_2 aws:ec2/subnet:Subnet create
+ datamanager_tg aws:lb/targetGroup:TargetGroup create
+ portfoliomanager_tg aws:lb/targetGroup:TargetGroup create
+ http_listener aws:lb/listener:Listener create
+ vpc aws:ec2/vpc:Vpc create
+ public_subnet_1 aws:ec2/subnet:Subnet create
+ public_subnet_1_rta aws:ec2/routeTableAssociation:RouteTableAssociation create
+ equitypricemodel_rule aws:lb/listenerRule:ListenerRule create
+ equitypricemodel_trainer_repository aws:ecr/repository:Repository create
+ service_discovery aws:servicediscovery/privateDnsNamespace:PrivateDnsNamespace create
+ ecs_egress aws:ec2/securityGroupRule:SecurityGroupRule create
+ equitypricemodel_task aws:ecs/taskDefinition:TaskDefinition create
+ equitypricemodel_sd aws:servicediscovery/service:Service create
+ public_subnet_2_rta aws:ec2/routeTableAssociation:RouteTableAssociation create
+ ecs_from_alb aws:ec2/securityGroupRule:SecurityGroupRule create
+ private_subnet_1 aws:ec2/subnet:Subnet create
+ vpc_endpoints_sg aws:ec2/securityGroup:SecurityGroup create
+ datamanager_sd aws:servicediscovery/service:Service create
+ datamanager_task aws:ecs/taskDefinition:TaskDefinition create
+ portfoliomanager_logs aws:cloudwatch/logGroup:LogGroup create
... and 28 other changes |
There was a problem hiding this comment.
Pull request overview
This pull request contains extensive fixes and improvements to the portfolio management system, addressing multiple issues across the stack including data fetching, prediction generation, portfolio rebalancing, and infrastructure configuration.
Changes:
- Added comprehensive error handling and logging across all services with Sentry integration
- Fixed timestamp type mismatches between Python (float) and Rust (i64) components
- Improved prediction batching to process all tickers instead of only the last batch
- Added SSM Parameter Store for runtime configuration management
- Enhanced infrastructure with S3 buckets, SageMaker roles, and ECR repositories
- Fixed JSON serialization issues in datamanager API responses
- Added GPU support for model training with CUDA-enabled Docker images
Reviewed changes
Copilot reviewed 37 out of 40 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Added sentry-sdk dependency for error tracking |
| tools/sync_equity_categories.py | New script to sync ticker sector/industry data from Polygon API |
| tools/prepare_training_data.py | New script to consolidate equity bars with categories for training |
| tools/run_training_job.py | Added configurable instance types for SageMaker training |
| infrastructure/parameters.py | New SSM parameter configuration for uncertainty threshold |
| infrastructure/main.py | Major infrastructure updates: S3 buckets, ECR repos, SageMaker roles |
| applications/portfoliomanager/server.py | Fixed prediction fetching, added better error handling and Sentry |
| applications/portfoliomanager/alpaca_client.py | Changed from notional to quantity-based orders, added error handling |
| applications/equitypricemodel/server.py | Added S3 artifact loading, fixed batch processing to handle all tickers |
| applications/equitypricemodel/Dockerfile | Added GPU support with CUDA base image |
| applications/datamanager/storage.rs | Fixed timestamp casting, improved query robustness with glob patterns |
| applications/datamanager/predictions.rs | Fixed JSON serialization to return valid JSON arrays |
| .claude/settings.local.json | Added MCP server permissions and AWS CLI commands |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Confidence Score: 4/5
|
Root cause: The equitypricemodel was using DISKCACHE=0 which is not the correct environment variable for tinygrad. Tinygrad's disk cache uses SQLite which cannot be shared across threads. In FastAPI's async/thread pool environment, this caused "SQLite objects created in a thread can only be used in that same thread" errors. Fix: Change environment variable from DISKCACHE=0 to DISABLE_DISK_CACHE=1 which is the correct tinygrad environment variable to disable SQLite disk caching of compiled kernels. Fixes PYTHON-FASTAPI-19 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously math.floor was used to round down share quantities to whole numbers, which would cause a ValueError when the dollar amount was less than the current stock price (qty=0). This prevented trading high-priced stocks with smaller position sizes. Now allows fractional share quantities since Alpaca supports fractional trading for most securities. Only validates that qty > 0 to catch edge cases with non-positive dollar amounts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
83b4238 to
0a0d571
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 40 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 42 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
- Fix validation_sample_size to actually sample batches: The validate_training_data method now samples sample_size batches from the total instead of validating all batches, improving efficiency for large datasets - Fix DISABLE_DISK_CACHE inconsistency: Changed docker-compose.yaml from DISABLE_DISK_CACHE=0 to DISABLE_DISK_CACHE=1 to match ECS production configuration and prevent SQLite threading errors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The `region` variable was a `Result<String, Error>` but was being used in format strings that require the `Display` trait. Added `?` operator to unwrap the Result, which either extracts the String or propagates the error. This fixes the compilation error: error[E0277]: `Result<std::string::String, errors::Error>` doesn't implement `std::fmt::Display` Also includes minor formatting cleanup from rustfmt. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains user-specific Claude Code permissions including AWS secrets manager access and MCP server settings. It should remain a local-only file that is not committed to the repository. Added explicit .gitignore entry for clarity. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check if train_batches is empty immediately after calling tide_data.get_batches() to fail fast with a descriptive error instead of raising an IndexError when accessing train_batches[0]. The error log and exception message include relevant context: - Configuration values (validation_split, input_length, output_length, batch_size) - Training data row count This helps diagnose issues with input data or configuration that result in no batches being created. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 45 out of 48 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tools/sync_equity_categories.py: - Add `from __future__ import annotations` for PEP 563 - Change quoted type hint `"S3Client"` to unquoted `S3Client` - Use cast() for type narrowing after None checks tools/prepare_training_data.py: - Use separate if checks for type narrowing - Use cast() for type narrowing after None checks tools/run_training_job.py: - Add noqa comment for PLR0913 - Normalize SAGEMAKER_INSTANCE_TYPE: strip whitespace and fall back to default when None or empty string portfoliomanager/alpaca_client.py: - Use quotes.get() instead of direct dict access to avoid KeyError - Add None check for quote before accessing properties - Handle ask_price/bid_price possibly being None before comparison Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 45 out of 48 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## Bug Fixes ### 1. PYTHON-FASTAPI-21: Position not found error (alpaca_client.py, server.py) **Root cause**: When closing positions, the code tried to close positions from `prior_portfolio` (database) without checking if they actually exist in Alpaca. Positions could have been manually closed or never opened successfully. **Fix**: Modified `close_position()` to return `bool` and gracefully handle "position not found" errors by returning `False` instead of raising. The server now logs these as "skipped" rather than errors. ### 2. PYTHON-FASTAPI-1E: SchemaError type mismatch (risk_management.py) **Root cause**: Join failed because `timestamp` column had different types (`f64` vs `i64`) between DataFrames. **Fix**: Added explicit cast of `prior_equity_bars.timestamp` to `Float64` for consistency with the other DataFrames. This ensures all timestamp columns have matching types for joins and comparisons. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR feedback: - alpaca_client.py: Check HTTP status code (404) and error_code attributes before falling back to string matching for position not found errors - risk_management.py: Document why unconditional timestamp casting is used (timestamps may arrive as i64 or f64 depending on serialization path) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Skip entries where the ticker value is empty or missing to prevent creating rows with empty ticker values in the categories CSV. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Overview
Changes
Comments