Expand initial Grafana metrics resources and upload script#601
Conversation
WalkthroughPrometheus metrics integration was added to both the datamanager and positionmanager services, each exposing a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant FastAPI (datamanager)
participant DuckDB
participant Prometheus
User->>FastAPI (datamanager): GET /metrics
FastAPI (datamanager)->>DuckDB: Query row count from GCS parquet files
DuckDB-->>FastAPI (datamanager): Return row count
FastAPI (datamanager)->>Prometheus: Update equity_bars_total_rows Gauge
FastAPI (datamanager)-->>User: Return JSON {total_rows}
sequenceDiagram
participant User
participant FastAPI (positionmanager)
participant AlpacaClient
participant Prometheus
User->>FastAPI (positionmanager): GET /metrics
FastAPI (positionmanager)->>AlpacaClient: get_account_information()
AlpacaClient-->>FastAPI (positionmanager): Return account data
FastAPI (positionmanager)->>AlpacaClient: get_positions()
AlpacaClient-->>FastAPI (positionmanager): Return positions list
FastAPI (positionmanager)->>Prometheus: Update portfolio Gauges (value, cash, positions, etc.)
FastAPI (positionmanager)-->>User: Return JSON with portfolio metrics
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
🔇 Additional comments (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Pull Request Overview
This PR expands initial Grafana metrics resources and adds an upload script, while also introducing new metrics endpoints and gauge definitions for the data manager, position manager, and prediction engine. Key changes include:
- Adding a Nu script for uploading a Grafana dashboard and its corresponding JSON file.
- Introducing new Prometheus metrics and endpoints in the position manager and data manager applications.
- Adjusting data reshaping for model dataset batches and adding account and position retrieval in the Alpaca client.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| infrastructure/upload_grafana_dashboard.nu | New Nu script for uploading a dashboard to Grafana Cloud. |
| infrastructure/grafana_dashboard.json | New Grafana dashboard configuration with various panels. |
| infrastructure/main.py | Exports additional metrics URLs for datamanager and positionmanager. |
| application/predictionengine/src/predictionengine/dataset.py | Updated target reshaping for batch processing. |
| application/positionmanager/src/positionmanager/main.py | Adds Prometheus gauges and a metrics endpoint using Alpaca client for account updates. |
| application/positionmanager/src/positionmanager/clients.py | Enhances the Alpaca client with get_account_info and get_positions methods. |
| application/datamanager/src/datamanager/main.py | Defines a new gauge and an endpoint to update equity bars metrics using a Parquet query. |
Comments suppressed due to low confidence (1)
application/positionmanager/src/positionmanager/main.py:79
- The gauge is set using position["unrealized_pl"], but the corresponding key returned from get_positions is "unrealized_profit_and_loss". Align these key names to ensure the correct value is used for the gauge.
portfolio_position_profit_and_loss_gauge.labels(symbol=symbol).set(
Graphite Automations"Assign author to pull request" took an action on this PR • (06/19/25)1 assignee was added to this PR based on John Forstmeier's automation. |
9e4090c to
e5eff11
Compare
…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
application/positionmanager/src/positionmanager/clients.py (1)
72-103: Well-implemented portfolio data methods.Both methods provide clean, structured access to Alpaca portfolio data with proper error handling using safe fallbacks. The float conversions with
or 0defaults prevent issues with None values.Note: The previous review comment about the typo in "unrealized_protif_and_loss_percent" has been addressed - the current code correctly uses "unrealized_profit_and_loss_percent".
infrastructure/__main__.py (1)
76-84: LGTM: Metrics URL exports are correctly implemented.The approach of exporting complete metrics URLs using Pulumi's
apply()method is appropriate for infrastructure configuration. While the previous comment suggested exporting base URLs and forming/metricswhere needed, this approach provides ready-to-use endpoints for monitoring configuration.
🧹 Nitpick comments (2)
infrastructure/upload_grafana_dashboard.nu (1)
1-48: Well-structured Nu shell script with good error handling.The script properly validates environment variables, checks file existence, and handles the Grafana API upload process. The error handling with try-catch is appropriate.
Consider adding validation for the HTTP response status to provide more specific error messages:
try { let response = $upload_payload | to json | http post --headers $headers $"($grafana_url)/api/dashboards/db" + # Validate response status if Nu shell supports it print "dashboard uploaded successfully!" } catch { |error| print $"failed to upload dashboard: ($error)" }application/positionmanager/src/positionmanager/main.py (1)
57-107: Verify AlpacaClient method availability and consider client reuse.The metrics endpoint implementation looks solid with comprehensive error handling. However, please verify that the
get_account_information()andget_positions()methods exist in the AlpacaClient class and return data in the expected format.#!/bin/bash # Description: Verify AlpacaClient methods exist and check their signatures # Expected: Find method definitions for get_account_information and get_positions echo "Searching for AlpacaClient method definitions..." ast-grep --pattern 'class AlpacaClient { $$$ def get_account_information($$$) { $$$ } $$$ }' ast-grep --pattern 'class AlpacaClient { $$$ def get_positions($$$) { $$$ } $$$ }' echo "Searching for any references to these methods..." rg -A 3 "get_account_information|get_positions"Minor performance consideration: Creating a new AlpacaClient instance on every metrics request could be optimized by reusing a client instance, though the current approach is acceptable for metrics endpoints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
application/datamanager/src/datamanager/main.py(2 hunks)application/positionmanager/src/positionmanager/clients.py(4 hunks)application/positionmanager/src/positionmanager/main.py(2 hunks)application/predictionengine/src/predictionengine/dataset.py(1 hunks)infrastructure/__main__.py(1 hunks)infrastructure/grafana_dashboard.json(1 hunks)infrastructure/upload_grafana_dashboard.nu(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Run Python tests
🔇 Additional comments (8)
application/predictionengine/src/predictionengine/dataset.py (1)
210-213: Verify the correct tinygrad Tensor reshape API usage.The change modifies the
reshapecall from passing a tuple(self.batch_size, 1)to separate argumentsself.batch_size, 1. This could be a breaking change depending on the tinygrad API expectations.#!/bin/bash # Description: Check tinygrad Tensor reshape API usage in the codebase # Expect: Find other reshape calls to verify the correct API usage pattern # Search for other reshape calls in the codebase rg -A 2 -B 2 "\.reshape\(" --type pyapplication/positionmanager/src/positionmanager/clients.py (2)
1-1: LGTM: Proper TYPE_CHECKING imports.The imports are correctly structured with TYPE_CHECKING to avoid runtime imports of typing-only dependencies.
Also applies to: 12-14
28-33: Good practice: Explicitly setting raw_data parameter.Setting
raw_data=Falseexplicitly makes the intent clear and ensures consistent behavior.application/datamanager/src/datamanager/main.py (2)
20-20: LGTM: Adding Prometheus client import.Proper import for metrics functionality.
79-82: LGTM: Well-defined Prometheus gauge metric.The gauge metric is properly defined with a descriptive name and help text.
infrastructure/grafana_dashboard.json (1)
1-270: Well-structured Grafana dashboard configuration.The dashboard JSON is properly configured with appropriate visualizations, units, and grid layout. The profit/loss panel effectively uses color thresholds (red for losses, green for profits), and the metric names follow Prometheus conventions. The time range (6 hours) and refresh interval (5 minutes) are reasonable for portfolio monitoring.
application/positionmanager/src/positionmanager/main.py (2)
10-10: Good addition of Prometheus client import.The import is correctly placed and necessary for the new metrics functionality.
24-49: Well-defined Prometheus metrics with proper naming conventions.The Gauge metrics are appropriately structured with descriptive names and help text. The position-specific gauges correctly use labels for symbol differentiation, which will enable proper visualization in the Grafana dashboard.
…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script

Overview
Changes
/metricshandlers for overloading scraping invocations to get application dataComments
I think this might work since it basically just adds in additional data to be collected for Grafana under the initial scraping configuration. I imagine we can adjust the frequency of scrapes in order to keep costs low. Also, these are just initial stabs at additional data to include and I'm open to changing them (e.g. rows of data).
Summary by CodeRabbit