Expand initial Grafana metrics resources and upload script by forstmeier · Pull Request #601 · oscmcompany/fund

forstmeier · 2025-06-19T13:46:07Z

Overview

Changes

add initial Grafana dashboard JSON definition
add /metrics handlers for overloading scraping invocations to get application data
Nu uploader script for Grafana dashboard definition

Comments

I think this might work since it basically just adds in additional data to be collected for Grafana under the initial scraping configuration. I imagine we can adjust the frequency of scrapes in order to keep costs low. Also, these are just initial stabs at additional data to include and I'm open to changing them (e.g. rows of data).

Summary by CodeRabbit

New Features
- Added Prometheus metrics endpoints to monitor equity bars data and portfolio statistics.
- Introduced a Grafana dashboard for visualizing key hedge fund metrics, including portfolio value, cash balance, positions, and profit/loss.
- Provided a script for uploading the Grafana dashboard to Grafana Cloud.
Improvements
- Enhanced access to account and position data from the Alpaca API.
Chores
- Exported new environment variables for metrics endpoints.

coderabbitai · 2025-06-19T13:46:13Z

Walkthrough

Prometheus metrics integration was added to both the datamanager and positionmanager services, each exposing a new /metrics endpoint with various Gauge metrics. The Alpaca client in positionmanager gained methods for retrieving account and position data. Infrastructure changes include new environment variables, a Grafana dashboard JSON, and a Nu shell script for dashboard uploads.

Changes

File(s)	Change Summary
application/datamanager/src/datamanager/main.py	Added Prometheus Gauge metric for equity bars row count and a `/metrics` endpoint to expose this data.
application/positionmanager/src/positionmanager/clients.py	Added `get_account_information` and `get_positions` methods to AlpacaClient; updated constructor to set `raw_data=False`.
application/positionmanager/src/positionmanager/main.py	Added multiple Prometheus Gauge metrics for portfolio/account/position data and a `/metrics` endpoint to expose them.
application/predictionengine/src/predictionengine/dataset.py	Changed tensor reshape call to use separate integer arguments instead of a tuple.
infrastructure/main.py	Added exports for `DATAMANAGER_METRICS_URL` and `POSITIONMANAGER_METRICS_URL` environment variables.
infrastructure/grafana_dashboard.json	Added a new Grafana dashboard JSON for visualizing fund metrics, positions, and data volumes.
infrastructure/upload_grafana_dashboard.nu	Added a Nu shell script to upload the Grafana dashboard JSON to Grafana Cloud, with error handling and environment validation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FastAPI (datamanager)
    participant DuckDB
    participant Prometheus

    User->>FastAPI (datamanager): GET /metrics
    FastAPI (datamanager)->>DuckDB: Query row count from GCS parquet files
    DuckDB-->>FastAPI (datamanager): Return row count
    FastAPI (datamanager)->>Prometheus: Update equity_bars_total_rows Gauge
    FastAPI (datamanager)-->>User: Return JSON {total_rows}

sequenceDiagram
    participant User
    participant FastAPI (positionmanager)
    participant AlpacaClient
    participant Prometheus

    User->>FastAPI (positionmanager): GET /metrics
    FastAPI (positionmanager)->>AlpacaClient: get_account_information()
    AlpacaClient-->>FastAPI (positionmanager): Return account data
    FastAPI (positionmanager)->>AlpacaClient: get_positions()
    AlpacaClient-->>FastAPI (positionmanager): Return positions list
    FastAPI (positionmanager)->>Prometheus: Update portfolio Gauges (value, cash, positions, etc.)
    FastAPI (positionmanager)-->>User: Return JSON with portfolio metrics

Possibly related PRs

add prometheus and grafana to stack #554: Integrates Prometheus metrics in datamanager with automatic instrumentation, differing in approach from this PR’s custom Gauge and SQL query method.

Poem

🥕
Metrics now bloom across the land,
With dashboards bright and data grand.
Prometheus counts, Grafana shows,
Our fund’s new health, in tidy rows.
The rabbit hops, the gauges rise,
And all our numbers harmonize!
📊

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0459fd and c999733.

📒 Files selected for processing (1)

application/datamanager/src/datamanager/main.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Run Python tests

🔇 Additional comments (2)

application/datamanager/src/datamanager/main.py (2)

20-20: LGTM!

The Prometheus client import is correctly added to support the new metrics functionality.

79-82: LGTM!

The Prometheus gauge metric is well-defined with a clear name and description that accurately reflects its purpose.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

forstmeier · 2025-06-19T13:46:26Z

Expand initial Grafana metrics resources and upload script #601 👈 (View in Graphite)
Add missing bucket resource definition #599
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copilot

Pull Request Overview

This PR expands initial Grafana metrics resources and adds an upload script, while also introducing new metrics endpoints and gauge definitions for the data manager, position manager, and prediction engine. Key changes include:

Adding a Nu script for uploading a Grafana dashboard and its corresponding JSON file.
Introducing new Prometheus metrics and endpoints in the position manager and data manager applications.
Adjusting data reshaping for model dataset batches and adding account and position retrieval in the Alpaca client.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
infrastructure/upload_grafana_dashboard.nu	New Nu script for uploading a dashboard to Grafana Cloud.
infrastructure/grafana_dashboard.json	New Grafana dashboard configuration with various panels.
infrastructure/main.py	Exports additional metrics URLs for datamanager and positionmanager.
application/predictionengine/src/predictionengine/dataset.py	Updated target reshaping for batch processing.
application/positionmanager/src/positionmanager/main.py	Adds Prometheus gauges and a metrics endpoint using Alpaca client for account updates.
application/positionmanager/src/positionmanager/clients.py	Enhances the Alpaca client with get_account_info and get_positions methods.
application/datamanager/src/datamanager/main.py	Defines a new gauge and an endpoint to update equity bars metrics using a Parquet query.

Comments suppressed due to low confidence (1)

application/positionmanager/src/positionmanager/main.py:79

The gauge is set using position["unrealized_pl"], but the corresponding key returned from get_positions is "unrealized_profit_and_loss". Align these key names to ensure the correct value is used for the gauge.

            portfolio_position_profit_and_loss_gauge.labels(symbol=symbol).set(

graphite-app · 2025-06-19T13:50:12Z

Graphite Automations

"Assign author to pull request" took an action on this PR • (06/19/25)

1 assignee was added to this PR based on John Forstmeier's automation.

The base branch was changed.

…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

application/positionmanager/src/positionmanager/clients.py (1)

72-103: Well-implemented portfolio data methods.

Both methods provide clean, structured access to Alpaca portfolio data with proper error handling using safe fallbacks. The float conversions with or 0 defaults prevent issues with None values.

Note: The previous review comment about the typo in "unrealized_protif_and_loss_percent" has been addressed - the current code correctly uses "unrealized_profit_and_loss_percent".

infrastructure/__main__.py (1)

76-84: LGTM: Metrics URL exports are correctly implemented.

The approach of exporting complete metrics URLs using Pulumi's apply() method is appropriate for infrastructure configuration. While the previous comment suggested exporting base URLs and forming /metrics where needed, this approach provides ready-to-use endpoints for monitoring configuration.

🧹 Nitpick comments (2)

infrastructure/upload_grafana_dashboard.nu (1)
1-48: Well-structured Nu shell script with good error handling.

The script properly validates environment variables, checks file existence, and handles the Grafana API upload process. The error handling with try-catch is appropriate.

Consider adding validation for the HTTP response status to provide more specific error messages:
 try {
     let response = $upload_payload 
     | to json 
     | http post --headers $headers $"($grafana_url)/api/dashboards/db"
     
+    # Validate response status if Nu shell supports it
     print "dashboard uploaded successfully!"
     
 } catch { |error|
     print $"failed to upload dashboard: ($error)"
 }
application/positionmanager/src/positionmanager/main.py (1)
57-107: Verify AlpacaClient method availability and consider client reuse.

The metrics endpoint implementation looks solid with comprehensive error handling. However, please verify that the get_account_information() and get_positions() methods exist in the AlpacaClient class and return data in the expected format.
#!/bin/bash
# Description: Verify AlpacaClient methods exist and check their signatures
# Expected: Find method definitions for get_account_information and get_positions

echo "Searching for AlpacaClient method definitions..."
ast-grep --pattern 'class AlpacaClient {
  $$$
  def get_account_information($$$) {
    $$$
  }
  $$$
}'

ast-grep --pattern 'class AlpacaClient {
  $$$
  def get_positions($$$) {
    $$$
  }
  $$$
}'

echo "Searching for any references to these methods..."
rg -A 3 "get_account_information|get_positions"
Minor performance consideration: Creating a new AlpacaClient instance on every metrics request could be optimized by reusing a client instance, though the current approach is acceptable for metrics endpoints.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7462dd and c0459fd.

📒 Files selected for processing (7)

application/datamanager/src/datamanager/main.py (2 hunks)
application/positionmanager/src/positionmanager/clients.py (4 hunks)
application/positionmanager/src/positionmanager/main.py (2 hunks)
application/predictionengine/src/predictionengine/dataset.py (1 hunks)
infrastructure/__main__.py (1 hunks)
infrastructure/grafana_dashboard.json (1 hunks)
infrastructure/upload_grafana_dashboard.nu (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Run Python tests

🔇 Additional comments (8)

application/predictionengine/src/predictionengine/dataset.py (1)

210-213: Verify the correct tinygrad Tensor reshape API usage.

The change modifies the reshape call from passing a tuple (self.batch_size, 1) to separate arguments self.batch_size, 1. This could be a breaking change depending on the tinygrad API expectations.
#!/bin/bash
# Description: Check tinygrad Tensor reshape API usage in the codebase
# Expect: Find other reshape calls to verify the correct API usage pattern

# Search for other reshape calls in the codebase
rg -A 2 -B 2 "\.reshape\(" --type py
application/positionmanager/src/positionmanager/clients.py (2)

1-1: LGTM: Proper TYPE_CHECKING imports.

The imports are correctly structured with TYPE_CHECKING to avoid runtime imports of typing-only dependencies.

Also applies to: 12-14

28-33: Good practice: Explicitly setting raw_data parameter.

Setting raw_data=False explicitly makes the intent clear and ensures consistent behavior.

application/datamanager/src/datamanager/main.py (2)

20-20: LGTM: Adding Prometheus client import.

Proper import for metrics functionality.

79-82: LGTM: Well-defined Prometheus gauge metric.

The gauge metric is properly defined with a descriptive name and help text.

infrastructure/grafana_dashboard.json (1)

1-270: Well-structured Grafana dashboard configuration.

The dashboard JSON is properly configured with appropriate visualizations, units, and grid layout. The profit/loss panel effectively uses color thresholds (red for losses, green for profits), and the metric names follow Prometheus conventions. The time range (6 hours) and refresh interval (5 minutes) are reasonable for portfolio monitoring.

application/positionmanager/src/positionmanager/main.py (2)

10-10: Good addition of Prometheus client import.

The import is correctly placed and necessary for the new metrics functionality.

24-49: Well-defined Prometheus metrics with proper naming conventions.

The Gauge metrics are appropriately structured with descriptive names and help text. The position-specific gauges correctly use labels for symbol differentiation, which will enable proper visualization in the Grafana dashboard.

…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script

graphite-app Bot assigned forstmeier Jun 19, 2025

forstmeier requested review from chrisaddy and Copilot June 19, 2025 13:46

forstmeier added application labels Jun 19, 2025

forstmeier mentioned this pull request Jun 19, 2025

Add missing bucket resource definition #599

Merged

forstmeier added this to the Refactor milestone Jun 19, 2025

Copilot AI reviewed Jun 19, 2025

View reviewed changes

Comment thread application/positionmanager/src/positionmanager/clients.py Outdated

Expand initial Grafana metrics resources and upload script

e5eff11

forstmeier force-pushed the 06-19-expand_initial_grafana_metrics_resources_and_upload_script branch from 9e4090c to e5eff11 Compare June 19, 2025 14:01

chrisaddy requested changes Jun 19, 2025

View reviewed changes

Comment thread infrastructure/__main__.py

chrisaddy self-requested a review June 19, 2025 15:06

chrisaddy previously approved these changes Jun 19, 2025

View reviewed changes

Base automatically changed from 06-12-add_missing_bucket_resource_definition to master June 19, 2025 15:44

forstmeier added 2 commits June 19, 2025 11:45

Expand initial Grafana metrics resources and upload script

a645b28

Merge branch '06-19-expand_initial_grafana_metrics_resources_and_uplo…

c0459fd

…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script

coderabbitai Bot reviewed Jun 19, 2025

View reviewed changes

Comment thread application/datamanager/src/datamanager/main.py

forstmeier added 2 commits June 19, 2025 11:56

Expand initial Grafana metrics resources and upload script

0f2d180

Merge branch '06-19-expand_initial_grafana_metrics_resources_and_uplo…

c999733

…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script

forstmeier merged commit 3d9142c into master Jun 19, 2025
6 checks passed

forstmeier deleted the 06-19-expand_initial_grafana_metrics_resources_and_upload_script branch June 19, 2025 16:05

coderabbitai Bot mentioned this pull request Sep 16, 2025

fetch data back from s3 #624

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand initial Grafana metrics resources and upload script#601

Expand initial Grafana metrics resources and upload script#601
forstmeier merged 5 commits intomasterfrom
06-19-expand_initial_grafana_metrics_resources_and_upload_script

forstmeier commented Jun 19, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

forstmeier commented Jun 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

graphite-app Bot commented Jun 19, 2025

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

forstmeier commented Jun 19, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Comments

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

forstmeier commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

graphite-app Bot commented Jun 19, 2025

Graphite Automations

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

forstmeier commented Jun 19, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

forstmeier commented Jun 19, 2025 •

edited

Loading