Skip to content

Expand initial Grafana metrics resources and upload script#601

Merged
forstmeier merged 5 commits intomasterfrom
06-19-expand_initial_grafana_metrics_resources_and_upload_script
Jun 19, 2025
Merged

Expand initial Grafana metrics resources and upload script#601
forstmeier merged 5 commits intomasterfrom
06-19-expand_initial_grafana_metrics_resources_and_upload_script

Conversation

@forstmeier
Copy link
Copy Markdown
Collaborator

@forstmeier forstmeier commented Jun 19, 2025

Overview

Changes

  • add initial Grafana dashboard JSON definition
  • add /metrics handlers for overloading scraping invocations to get application data
  • Nu uploader script for Grafana dashboard definition

Comments

I think this might work since it basically just adds in additional data to be collected for Grafana under the initial scraping configuration. I imagine we can adjust the frequency of scrapes in order to keep costs low. Also, these are just initial stabs at additional data to include and I'm open to changing them (e.g. rows of data).

Summary by CodeRabbit

  • New Features
    • Added Prometheus metrics endpoints to monitor equity bars data and portfolio statistics.
    • Introduced a Grafana dashboard for visualizing key hedge fund metrics, including portfolio value, cash balance, positions, and profit/loss.
    • Provided a script for uploading the Grafana dashboard to Grafana Cloud.
  • Improvements
    • Enhanced access to account and position data from the Alpaca API.
  • Chores
    • Exported new environment variables for metrics endpoints.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 19, 2025

Walkthrough

Prometheus metrics integration was added to both the datamanager and positionmanager services, each exposing a new /metrics endpoint with various Gauge metrics. The Alpaca client in positionmanager gained methods for retrieving account and position data. Infrastructure changes include new environment variables, a Grafana dashboard JSON, and a Nu shell script for dashboard uploads.

Changes

File(s) Change Summary
application/datamanager/src/datamanager/main.py Added Prometheus Gauge metric for equity bars row count and a /metrics endpoint to expose this data.
application/positionmanager/src/positionmanager/clients.py Added get_account_information and get_positions methods to AlpacaClient; updated constructor to set raw_data=False.
application/positionmanager/src/positionmanager/main.py Added multiple Prometheus Gauge metrics for portfolio/account/position data and a /metrics endpoint to expose them.
application/predictionengine/src/predictionengine/dataset.py Changed tensor reshape call to use separate integer arguments instead of a tuple.
infrastructure/main.py Added exports for DATAMANAGER_METRICS_URL and POSITIONMANAGER_METRICS_URL environment variables.
infrastructure/grafana_dashboard.json Added a new Grafana dashboard JSON for visualizing fund metrics, positions, and data volumes.
infrastructure/upload_grafana_dashboard.nu Added a Nu shell script to upload the Grafana dashboard JSON to Grafana Cloud, with error handling and environment validation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FastAPI (datamanager)
    participant DuckDB
    participant Prometheus

    User->>FastAPI (datamanager): GET /metrics
    FastAPI (datamanager)->>DuckDB: Query row count from GCS parquet files
    DuckDB-->>FastAPI (datamanager): Return row count
    FastAPI (datamanager)->>Prometheus: Update equity_bars_total_rows Gauge
    FastAPI (datamanager)-->>User: Return JSON {total_rows}
Loading
sequenceDiagram
    participant User
    participant FastAPI (positionmanager)
    participant AlpacaClient
    participant Prometheus

    User->>FastAPI (positionmanager): GET /metrics
    FastAPI (positionmanager)->>AlpacaClient: get_account_information()
    AlpacaClient-->>FastAPI (positionmanager): Return account data
    FastAPI (positionmanager)->>AlpacaClient: get_positions()
    AlpacaClient-->>FastAPI (positionmanager): Return positions list
    FastAPI (positionmanager)->>Prometheus: Update portfolio Gauges (value, cash, positions, etc.)
    FastAPI (positionmanager)-->>User: Return JSON with portfolio metrics
Loading

Possibly related PRs

Poem

🥕
Metrics now bloom across the land,
With dashboards bright and data grand.
Prometheus counts, Grafana shows,
Our fund’s new health, in tidy rows.
The rabbit hops, the gauges rise,
And all our numbers harmonize!
📊


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0459fd and c999733.

📒 Files selected for processing (1)
  • application/datamanager/src/datamanager/main.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (2)
application/datamanager/src/datamanager/main.py (2)

20-20: LGTM!

The Prometheus client import is correctly added to support the new metrics functionality.


79-82: LGTM!

The Prometheus gauge metric is well-defined with a clear name and description that accurately reflects its purpose.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Collaborator Author

forstmeier commented Jun 19, 2025

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR expands initial Grafana metrics resources and adds an upload script, while also introducing new metrics endpoints and gauge definitions for the data manager, position manager, and prediction engine. Key changes include:

  • Adding a Nu script for uploading a Grafana dashboard and its corresponding JSON file.
  • Introducing new Prometheus metrics and endpoints in the position manager and data manager applications.
  • Adjusting data reshaping for model dataset batches and adding account and position retrieval in the Alpaca client.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
infrastructure/upload_grafana_dashboard.nu New Nu script for uploading a dashboard to Grafana Cloud.
infrastructure/grafana_dashboard.json New Grafana dashboard configuration with various panels.
infrastructure/main.py Exports additional metrics URLs for datamanager and positionmanager.
application/predictionengine/src/predictionengine/dataset.py Updated target reshaping for batch processing.
application/positionmanager/src/positionmanager/main.py Adds Prometheus gauges and a metrics endpoint using Alpaca client for account updates.
application/positionmanager/src/positionmanager/clients.py Enhances the Alpaca client with get_account_info and get_positions methods.
application/datamanager/src/datamanager/main.py Defines a new gauge and an endpoint to update equity bars metrics using a Parquet query.
Comments suppressed due to low confidence (1)

application/positionmanager/src/positionmanager/main.py:79

  • The gauge is set using position["unrealized_pl"], but the corresponding key returned from get_positions is "unrealized_profit_and_loss". Align these key names to ensure the correct value is used for the gauge.
            portfolio_position_profit_and_loss_gauge.labels(symbol=symbol).set(

Comment thread application/positionmanager/src/positionmanager/clients.py Outdated
@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Jun 19, 2025

Graphite Automations

"Assign author to pull request" took an action on this PR • (06/19/25)

1 assignee was added to this PR based on John Forstmeier's automation.

@forstmeier forstmeier force-pushed the 06-19-expand_initial_grafana_metrics_resources_and_upload_script branch from 9e4090c to e5eff11 Compare June 19, 2025 14:01
Comment thread infrastructure/__main__.py
@chrisaddy chrisaddy self-requested a review June 19, 2025 15:06
chrisaddy
chrisaddy previously approved these changes Jun 19, 2025
Base automatically changed from 06-12-add_missing_bucket_resource_definition to master June 19, 2025 15:44
@forstmeier forstmeier dismissed chrisaddy’s stale review June 19, 2025 15:44

The base branch was changed.

…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
application/positionmanager/src/positionmanager/clients.py (1)

72-103: Well-implemented portfolio data methods.

Both methods provide clean, structured access to Alpaca portfolio data with proper error handling using safe fallbacks. The float conversions with or 0 defaults prevent issues with None values.

Note: The previous review comment about the typo in "unrealized_protif_and_loss_percent" has been addressed - the current code correctly uses "unrealized_profit_and_loss_percent".

infrastructure/__main__.py (1)

76-84: LGTM: Metrics URL exports are correctly implemented.

The approach of exporting complete metrics URLs using Pulumi's apply() method is appropriate for infrastructure configuration. While the previous comment suggested exporting base URLs and forming /metrics where needed, this approach provides ready-to-use endpoints for monitoring configuration.

🧹 Nitpick comments (2)
infrastructure/upload_grafana_dashboard.nu (1)

1-48: Well-structured Nu shell script with good error handling.

The script properly validates environment variables, checks file existence, and handles the Grafana API upload process. The error handling with try-catch is appropriate.

Consider adding validation for the HTTP response status to provide more specific error messages:

 try {
     let response = $upload_payload 
     | to json 
     | http post --headers $headers $"($grafana_url)/api/dashboards/db"
     
+    # Validate response status if Nu shell supports it
     print "dashboard uploaded successfully!"
     
 } catch { |error|
     print $"failed to upload dashboard: ($error)"
 }
application/positionmanager/src/positionmanager/main.py (1)

57-107: Verify AlpacaClient method availability and consider client reuse.

The metrics endpoint implementation looks solid with comprehensive error handling. However, please verify that the get_account_information() and get_positions() methods exist in the AlpacaClient class and return data in the expected format.

#!/bin/bash
# Description: Verify AlpacaClient methods exist and check their signatures
# Expected: Find method definitions for get_account_information and get_positions

echo "Searching for AlpacaClient method definitions..."
ast-grep --pattern 'class AlpacaClient {
  $$$
  def get_account_information($$$) {
    $$$
  }
  $$$
}'

ast-grep --pattern 'class AlpacaClient {
  $$$
  def get_positions($$$) {
    $$$
  }
  $$$
}'

echo "Searching for any references to these methods..."
rg -A 3 "get_account_information|get_positions"

Minor performance consideration: Creating a new AlpacaClient instance on every metrics request could be optimized by reusing a client instance, though the current approach is acceptable for metrics endpoints.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7462dd and c0459fd.

📒 Files selected for processing (7)
  • application/datamanager/src/datamanager/main.py (2 hunks)
  • application/positionmanager/src/positionmanager/clients.py (4 hunks)
  • application/positionmanager/src/positionmanager/main.py (2 hunks)
  • application/predictionengine/src/predictionengine/dataset.py (1 hunks)
  • infrastructure/__main__.py (1 hunks)
  • infrastructure/grafana_dashboard.json (1 hunks)
  • infrastructure/upload_grafana_dashboard.nu (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Run Python tests
🔇 Additional comments (8)
application/predictionengine/src/predictionengine/dataset.py (1)

210-213: Verify the correct tinygrad Tensor reshape API usage.

The change modifies the reshape call from passing a tuple (self.batch_size, 1) to separate arguments self.batch_size, 1. This could be a breaking change depending on the tinygrad API expectations.

#!/bin/bash
# Description: Check tinygrad Tensor reshape API usage in the codebase
# Expect: Find other reshape calls to verify the correct API usage pattern

# Search for other reshape calls in the codebase
rg -A 2 -B 2 "\.reshape\(" --type py
application/positionmanager/src/positionmanager/clients.py (2)

1-1: LGTM: Proper TYPE_CHECKING imports.

The imports are correctly structured with TYPE_CHECKING to avoid runtime imports of typing-only dependencies.

Also applies to: 12-14


28-33: Good practice: Explicitly setting raw_data parameter.

Setting raw_data=False explicitly makes the intent clear and ensures consistent behavior.

application/datamanager/src/datamanager/main.py (2)

20-20: LGTM: Adding Prometheus client import.

Proper import for metrics functionality.


79-82: LGTM: Well-defined Prometheus gauge metric.

The gauge metric is properly defined with a descriptive name and help text.

infrastructure/grafana_dashboard.json (1)

1-270: Well-structured Grafana dashboard configuration.

The dashboard JSON is properly configured with appropriate visualizations, units, and grid layout. The profit/loss panel effectively uses color thresholds (red for losses, green for profits), and the metric names follow Prometheus conventions. The time range (6 hours) and refresh interval (5 minutes) are reasonable for portfolio monitoring.

application/positionmanager/src/positionmanager/main.py (2)

10-10: Good addition of Prometheus client import.

The import is correctly placed and necessary for the new metrics functionality.


24-49: Well-defined Prometheus metrics with proper naming conventions.

The Gauge metrics are appropriately structured with descriptive names and help text. The position-specific gauges correctly use labels for symbol differentiation, which will enable proper visualization in the Grafana dashboard.

Comment thread application/datamanager/src/datamanager/main.py
…ad_script' of github.com:pocketsizefund/pocketsizefund into 06-19-expand_initial_grafana_metrics_resources_and_upload_script
@forstmeier forstmeier merged commit 3d9142c into master Jun 19, 2025
6 checks passed
@forstmeier forstmeier deleted the 06-19-expand_initial_grafana_metrics_resources_and_upload_script branch June 19, 2025 16:05
@coderabbitai coderabbitai Bot mentioned this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants